Tuesday, December 15, 2009

What is longitudnal data?

A dataset is considered to be longitudnal if it tracks the same kind of information at multiple points of time. For e.g. the marks of students over multiple years, patient health records over a period of time, etc.

The most important advantage of longitudnal data is that we can measure change and the effect of various factors over the data-point time values. For e.g. what is the effect a particular drug had on a cancer patient? The effect of different teachers on a student?

So essentially, longitudnal data helps in establishing cause-n-effect relationships. Longitudnal data stores are also being used for predictive modeling and other areas. Longitudnal data stores are very popular in the Life Sciences and Healthcare industry.
I am interesting in learning the best practices for creating and optimizing a data-model for longitudnal data stores.

Difference between biostatistics and bioinformatics

Working in the healthcare domain, I often come across the terms - biostatistics and bioinformatics and wondered as to what were the differences between the two branches of studies. A quick googling revealed the following:
The term Biostatistics is a combination of the words biology and statistics. So it essentially it is the application of statistics to biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine and agriculture; the collection, summarization, and analysis of data from those experiments; and the interpretation of, and inference from, the results.
Bioinformatics is the application of information technology and computer science to the field of molecular biology. Its primary use has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
Bioinformatics focuses on applying computationally intensive techniques (e.g., pattern recognition, data mining, machine learning algorithms, and visualization) to understand biological processes.
More information can be found on Wikipedia at: