Tuesday, October 30, 2012

Ruminating on Transaction Logs

Understanding the working of transaction logs for any RDBMS is very important for any application design. Found the following good articles that explain the important concepts in a simple language.

http://www.simple-talk.com/sql/learn-sql-server/managing-transaction-logs-in-sql-server/
http://www.techrepublic.com/article/understanding-the-importance-of-transaction-logs-in-sql-server/5173108

Any database consists of log files and data files. In the MS world, they are known as *.ldf and *.mdf files respectively. All database transactions (modifications) are first written to the log file. There is a separate thread (or bunch of threads) that writes from the buffer cache to the data file periodically.  Once data is written to a data-file, a checkpoint is written to the transaction log. This checkpoint is used as a reference to "roll forward" all transactions. i.e. All transactions after the last checkpoint are applied to the datafile when the server is restarted after a failure. This prevents transactions from being lost that were in the buffer but not yet written to the data file.

Transaction logs are required for rollback, log shipping, backup, etc. The transaction log files should be managed by a DBA or else we would run into problems if the log file fills up all the available hard disk space. The DBA should also periodically back-up the log files. The typical back-up commands also truncate the log files. In some databases, the truncation process just marks old records as inactive so they can be overwritten. In such cases, even after truncation, the size of the log file does not reduce and we may have to use different commands to compact or shrink the log files.

A good post on truncation options for SQL Server 2008 is given below:
http://www.codeproject.com/Articles/380879/About-transaction-log-and-its-truncation-in-SQL-Se

Wednesday, October 17, 2012

Peformance impact of TDE at database level

I was always under the opinion that column-level encryption is better and more performant than database or tablespace level encryption. But after much research and understanding the internal working on TDE (Transparent Data Encryption) on SQLServer and Oracle, it does not look to be a bad deal !

In fact, if we have a lot of columns that need to be encrypted and also need to fire queries against the encrypted columns, then a full database (tablespace) level encryption using TDE seems to be the best option.

I was a bit skeptical on the issue of performance degradation in using full database TDE, but it may not be so. First and foremost, column-level (cell) encryption can severely affect the database query optimization functions and result in significantly worse performance than encrypting the entire database.
When we use TDE at the database (tablespace) level, then the DB engine can use bulk encryption for entire blocks of data as they are written to or read from the disk.

It is important to note that full database TDE actually works at the data-file level and not at each table/column level. To put it in other words, the data is not encrypted but rather entire data files (index files, log files, etc.) are encrypted.

Microsoft states that the performance degradation of using database level TDE is a mere 3-6%.
Oracle states that in 11g, if we use Intel XEON processesor with AES instruction set, then there is a "near-zero" impact on database performance.

It is important to note the terminology differences regarding TDE used by Microsoft and Oracle. Microsoft refers to full database encryption as TDE (not column-level). Oracle calls it TDE-tablespace and TDE-column level.

Also TDE is a proven solution from a regulatory perspective - e.g. PCI. Auditors are more comfortable approving a proven industry solution that any custom logic that is implemented in application code.

Tuesday, October 16, 2012

Column Level Encryption in SQLServer

We were exploring the option of using cell-level or column-level encryption in SQL Server. The option of using TDE (Transaparent Data Encryption) was dismissed due to performance reasons and we just wanted to encrypt a few columns.

Found this nice tutorial that quickly explains how to create a symmetric key for SQL Server encryption. Excerpts from the blog:

1. Create a certificate that would be used to encrypt our symmetric key.
CREATE CERTIFICATE MyCertificateName
WITH SUBJECT = 'A label for this certificate'
  
2. Create a symmetric key by giving a passphrase (KEY_SOURCE) and GUID seed (IDENTITY_VALUE).
CREATE SYMMETRIC KEY MySymmetricKeyName WITH
IDENTITY_VALUE = 'a fairly secure name',
ALGORITHM = AES_256,
KEY_SOURCE = 'a very secure strong password or phrase'
ENCRYPTION BY CERTIFICATE MyCertificateName;
 
To ensure we can replicate the key on another server, or rebuild the key if it is corrupted, you must very safely keep note of the KEY_SOURCE and IDENTITY_VALUE parameters, as these are what is used to create the key. These can be used to regenerate the key.

3. Encrypt the data
EncryptByKey(Key_GUID(‘MySymmetricKeyName’), @ValueToEncrypt)  

4. Decrypt the data
DecryptByKey(@ValueToDecrypt)
 
The only parameter that the decrypt function needs is the data you wish to decrypt. We do not need to pass the key name to the decryption function, SQLServer will determine which open key needs to be used.

Monday, October 15, 2012

Where is the private key of a Digital Cert stored?

Today, one of my team members asked an innocuous question that whether the private key is stored in a digital certificate?

The answer is an obvious 'NO' and its unfortunate that so many folks still struggle to understand the basics of PKI. A digital certificate is nothing but a container for your public key and the digital signature of your public key (hashed and encrypted) by the CA's private key. More information is available here.

I think a lot of people get confused because the use-cases for encryption are different. For e.g. if I want to send sensitive data to a receiver, then I will encrypt the data with the public key of the receiver. But if I want to sign a document, I will use my private key to digitally sign the document.

This site contains a neat table that gives a good overview of what key is used when.


Key Function
Key Type
Whose Key Used
Encrypt data for a recipient
Public key
Receiver
Decrypt data received
Private key
Receiver
Sign data
Private key
Sender
Verify a signature
Public key
Sender


In the past, I have blogged about digital certs here that would be good for a quick perusal :-
http://www.narendranaidu.com/2007/09/formats-for-digital-certificates.html
http://www.narendranaidu.com/2009/06/creating-self-signed-certificate.html

So where is the private key stored? The private key is always stored securely on the server and never revealed to anyone. When digital certs are used for SSL (https) enablement on a server, then the server programs for cert management typically abstract the user from the challenges of manual key management as stated below:-

IIS Windows
On IIS, we use the web based console can be used to generate CSR (Certificate Signing Request). The private key is generated at the same time and stored securely by IIS. When we receive the digital cert from the CA, we open the "Pending Request" screen of the console. Here based on the attributes of the digital certificate, the same gets associated with the private key.

WebSphere
WebSphere 6.0 and earlier versions come with a tool called as iKeyman that is used to generate a private key file and a certificate request file. WebSphere 6.1 and later versions the web based admin console can be used to manage certs.

OpenSSL
OpenSSL also works with 2 files - the digital certificate (server.crt) and private key (server.key)
If we want to verify that a Private key matches a Certificate, then we can use the commands given here.

Monday, October 08, 2012

Random Number Generator Code

We were writing a common utility to help the development team is generating random numbers. The utility would allow the developer to choose between PRNG and TRNG. Also the developer can specify the range between which the random number needs to be generated.

We stopped our efforts mid-way, when we saw the excellent RandomDataGenerator class in the Apache commons Math library. This library has all the functions that would be required for most use-cases.

For e.g. to generate a random number between 1 and 1000 use the following method:
//not secure for cryptography, but super-fast
int value = randomNumberGenerator.nextInt(1,1000);
int value = randomNumberGenerator.nextSecureInt(1,1000); //secure

To generate random positive integers:
int value = randomNumberGenerator.nextInt(1,Integer.MAX_VALUE);

UUID vs SecureRandom in Java

Recently, one of my team members was delibrating on using the java.util.UUID class or the java.util.SecureRandom class for a use-case to generate a unique random number.

When we digged open the source code of UUID, we were suprised to see that it uses SecureRandom behind the scenes to create a 128-bit (16 bytes) GUID.
The second question was the probability of a collision using the UUID class. Wikipedia has a good discussion on this point available here. What is states is that - even if we were to generate 1 million random UUIDs a second, the chances of a duplicate occurring in our lifetime would be extremely small.
Another challenge is that to detect the duplicate, we will have to write a powerful algorighm running on a super-computer that would compare 1 million new UUIDs per second against all of the UUIDs that we have previously generated... :)

Java's UUID is a type-4 GUID, hence the first 6 bits are not random and used for type (2 bits) and version number (4 bits). So the period of UUID is 2 'raised to' 122 - enough for all practical uses..

Ruminating on random-ness

Anyone who has dabbled at cryptograpy would know the difference between a PRNG and a TRNG :)
PRNG - Psuedo Random Number Generator
TRNG - True Random Number Generator

So, what's the real fuss on randomness? And why is it important to understand how random-number generators work?

First and foremost, we have to understand that most random-number generators depend on some mathematical formula to derive a "random" number - based on an input (called as seed).
Since a deterministic algorithm is used to arrive at the random number, these numbers are called as "pseudo-random"; i.e. they appear random to the casual observer, but can be hacked.

For e.g. the typical algorithm used by the java.util.Random() class is illustrated here. Such a PRNG would always produce the same sequence of random numbers for a given seed - even if run on different computers. Hence if someone can guess the initial seed, then it would be possible to predict the next sequence of random numbers. The default constructor of Random() class uses the "system time" as the seed and hence this option is a little bit more secure that manually providing the 'seed'. But still it is  possible for an attacker to synchronize into the stream of such random numbers and therefore calculate all future random numbers!

So how can we generate true random numbers? There are multiple options here:

1) Use a hardware random number generator. Snippet from Wikipedia:

"Such devices are often based on microscopic phenomena that generate a low-level, statistically random 'noise' signal, such as thermal noise or the photoelectric effect or other quantum phenomena. 
A hardware random number generator typically consists of a transducer to convert some aspect of the physical phenomena to an electrical signal, an amplifier and other electronic circuitry to increase the amplitude of the random fluctuations to a macroscopic level, and some type of analog to digital converter to convert the output into a digital number, often a simple binary digit 0 or 1. 
By repeatedly sampling the randomly varying signal, a series of random numbers is obtained"

2) Use cryptographically secure random number generators (TRNG): These number generators collect entropy from various inputs that are truly unpredictable. For e.g. on Linux, /dev/random works in a sophisticated manner to capture hardware interrupts, CPU clock speeds, network packets, user inputs from keyboard, etc. to arrive at a truly random seed.
On Windows, many parameters such as process ID, thread ID, the system clock, the system time, the system counter, memory status, free disk clusters, the hashed user environment bloc, etc. are used to seed the PRNG.

On the Java platform, it is recommended to use the java.security.SecureRandom class, as it delegates the entropy finding to a CSP (Cryptography Service Provider) that typically delegates the calls to the OS.

3) Use a third-party webservice that would return true random numbers. For e.g. http://www.random.org/ would provide you with random numbers generated from 'atmospheric noise'.
The site Hotbits would provide you with random numbers derived from unpredictable radioactive decay.

Wednesday, October 03, 2012

Troubleshooting XML namespaces binding in SOAP request using JAXB

Recently helped a team resolve an issue regarding namespace handling in JAX-WS / CXF. Jotting down the solution as it might help other folks breaking their heads on this issue :)

Our Java application needed to consume a .NET webservice and were facing challenges in SOA interoperability. We created the client stubs using the WSDL2Java tool of CXF. The WSDL had - elementFormDefault="qualified"

The problem we ran across was that the complex types were all being returned as 'null'. Enabling the network sniffer, we checked the raw SOAP reponse reaching the client. The response was OK with all fields populated. Hence the real issue was in the data-binding that was happening to the Java objects.

A quick google search revealed that the default behavior of JAXB when it encounters marshelling/unmarshalling errors is to ingore the exception and put it as a warning messages !!. This was a shock, as there was no way to debug the issue on the server.

We then wrote a sample JAXB application and tried to un-marshall the XML/SOAP message. It is here that we started getting the following error stack:
org.apache.cxf.interceptor.Fault: Unmarshalling Error: unexpected element

Hence, it was proved that the real culprit was the JAXB unmarshaller that was not generating the Java classes with the correct namespace.

First we checked the generated Java classes and found a file called as "package-info.java" that had a XML namespace 'annotated' on the package name - essentially a package annotation. So this was supposed to work, but why is the unmarshaller throwing an exception?

We tried adding the following attributes to the annotation and then the unmarshaller started working !!!
@javax.xml.bind.annotation.XmlSchema (
    namespace = "http://com.mypackage",
    elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED,
    attributeFormDefault = javax.xml.bind.annotation.XmlNsForm.UNQUALIFIED

  ) 


Not sure why the WSDL2Java tool did not add these automatically based on the elementFormDefault="qualified" present in the WSDL.  But this trick worked and we could consume the .NET webservice. We had to modify the build script to replace the "package-info.java" file everytime it was recreated.

Another option is to manually add the "namespace" attribute to all XMLTypes in the generated Java classes; but this is very tedious.