Wednesday, September 08, 2021

Ruminating on Systems Thinking

Systems Thinking is an approach towards problem solving that emphasizes the fact that most systems are complex and do not have a linear cause-effect relationship. We need to view the problem as part of a wider dynamic system and understand how different parts interact or influence one other. 

The crux of systems thinking is the ability to zoom out and look at the big picture (holistic view) - to understand the various factors or perspectives that play a role. 

By steering away from linear thinking into systems thinking, we will make better decisions in all aspects of business. System thinking also helps us to think in terms of ecosystems and not just be confined by traditional boundaries. 

The following video is an excellent tutorial on Systems Thinking - https://youtu.be/GPW0j2Bo_eY

Other good links showing examples of system thinking: 

- Use of DDT for mosquitos: https://thesystemsthinker.com/making-the-jump-to-systems-thinking/

- Systems thinking for water services - e.g. understanding the impact of rainfall, educating customers to improve the quality of wastewater, external factors such as new building development, rising river levels, etc : https://www.waterindustryjournal.co.uk/united-utilities-rolls-out-systems-thinking-approach-to-wastewater-network-management

Wednesday, September 01, 2021

Ruminating on Data Drift and Concept Drift

 Quite often, the performance (accuracy of prediction) of a AI model degrades with time. One of the reasons this happens is due to a phenomenon called as "Data Drift". 

So what exactly is Data Drift? 

Data Drift can be defined as any change to the structure, semantics or statistical properties of data - i.e. model input data.  

  • Changes to structure of data: New fields being added or old field deleted. This could happen because of a new upgrade to a upstream system, a new sensor, etc. 
  • Changes to semantics of data: A new upstream system is sending temperature in F and not C. 
  • Changes to statistical properties of data: Changes to the atmospheric pressure threshold levels due to environmental changes. There could be also data quality issues such as a bug in upstream system that delivers junk data. 

To maintain the accuracy of our AI models, it is imperative that we measure and monitor Data Drift. Our machine learning infrastructure needs to have tools that automatically detect data drift and can pin-point the features that are causing the drift. 

Changes to the underlying statistical properties of data is also called as "Concept Drift". A classic example of this is the current pandemic. The "behaviour" or "concept" has changed after the pandemic - e.g. models that predict the amount of commute time are no longer valid. Models that forecasted the number of the cosmetic surgeries in 2021 are no longer valid. 

Most of the hyperscalers provide services that enable us to monitor data drift and take proactive actions. The below links provide some examples of cloud services for data drift:

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-monitor-datasets?tabs=python

https://aws.amazon.com/sagemaker/model-monitor/

https://cloud.google.com/blog/topics/developers-practitioners/event-triggered-detection-data-drift-ml-workflows