Monday, May 11, 2026

Ruminating on Human in the Loop (HITL) vs Human on the Loop (HOTL)

 As AI systems become more embedded in enterprise workflows, the conversation is no longer just about capability—it’s about control. Two models often come up in this discussion: 

  • Human in the Loop (HITL) 
  • Human on the Loop (HOTL)

While they sound similar, they represent fundamentally different approaches to how humans interact with AI systems.

Understanding this distinction is critical for designing safe, scalable, and efficient AI-driven processes.



Human in the Loop (HITL): Control Before Action

In the HITL model, humans are directly embedded in the decision-making process. The AI generates outputs, but execution depends on explicit human approval or validation.  

This model is best suited for:

  • High-risk decisions (financial transactions, compliance approvals)
  • Low-confidence AI outputs
  • Regulatory or audit-heavy environments

Think of HITL as a gated workflow: the AI proposes, but the human disposes.

For example, in an ERP system like Oracle Fusion, an AI might recommend vendor payments or flag anomalies—but a finance controller must approve before funds are released. This ensures accountability and reduces the risk of automation errors propagating into real-world impact.

The trade-off is clear: higher reliability and governance, but reduced speed and scalability.

Human on the Loop (HOTL): Control Through Oversight

HOTL shifts the paradigm. Here, AI systems operate autonomously, making decisions and executing actions without requiring prior human approval. Humans remain in a supervisory role and can intervene when necessary. 

This model is ideal for:

  • High-volume, repetitive tasks
  • Real-time decision environments
  • Mature AI systems with proven accuracy

In this setup, the human is not blocking the process—they are monitoring it. A good example is automated fraud detection. An AI system might automatically block suspicious transactions in real time, while human analysts review flagged patterns and adjust thresholds or intervene in edge cases. The system moves fast, but oversight ensures it doesn’t drift into unsafe behavior. 

The trade-off here flips: speed and scalability increase, but it requires strong monitoring, alerting, and fallback mechanisms.

Confusing HITL and HOTL can lead to poorly designed systems. Overusing HITL creates bottlenecks and defeats the purpose of automation. Overusing HOTL without proper guardrails can introduce silent failures at scale.

The real design challenge is deciding:

  • When does AI need approval?
  • When can it act independently?
  • How do we transition from HITL to HOTL as confidence grows?

This is where concepts like confidence thresholds, risk scoring, and progressive autonomy come into play.

The Two-Model Perspective

Another way to interpret this “bi-modal” structure is through a two-model system:

  • A decision model that performs the task (e.g., classification, prediction, action)
  • A governance model that determines whether human intervention is required

For instance, an AI might assign a confidence score to its output. If the score is below a defined threshold, the system routes the task into a HITL flow. If it exceeds the threshold, it proceeds autonomously under HOTL. This layered approach allows organizations to dynamically balance risk and efficiency, rather than hardcoding one model across all scenarios.

Effective AI governance will increasingly rely on:

  • Dynamic switching between HITL and HOTL
  • Real-time monitoring and explainability
  • Feedback loops that continuously improve both models

Organizations that get this right will not only scale AI faster but also build trust in its decisions. In the end, the question is not whether humans should be involved—it’s how and when.

Tuesday, February 17, 2026

Split before data pre-processing or after?

In machine learning workflows, the standard practice is to split datasets into training and testing subsets before applying most preprocessing transformations to prevent data leakage. 

However, certain preliminary data cleaning operations may be performed safely on the entire dataset beforehand, as they do not depend on statistical summaries or introduce information from the test set into the training process. 

Given below are examples of preprocessing that can be done before splitting. 

  • Removing duplicates. 
  • Fixing data types - e.g. date strings
  • Remove bad data or impossible values - e.g. age > 150
  • Removing whitespace from strings - e.g. trim the text
Thus, as long as you are not using statistics to impute missing values in the dataset, you can do the preprocessing before the split (into training/test). 

Operations involving data-derived statistics—such as imputation with means/medians, standardization, one-hot encoding based on frequencies, or percentile-based outlier removal—must be fitted exclusively on the training set. Hence this kind of data pre-processing should be only done after splitting, or you will end up with something called as 'data leakage'.

So what exactly is data leakage? You can understand it with the following analogy. 
  • Imagine you're studying for an exam.
  • You’re supposed to practice using your textbook (training data) and then take the exam (test data) to see how well you’ve learned.
  • Now imagine someone secretly shows you some of the exam questions while you’re studying.
  • When you take the test, you score really high — but not because you truly understood the material. You just recognized the questions. That’s data leakage!!!
In simple terms:
  • The training data is what the model learns from.
  • The test data is supposed to check how well it learned.
  • If information from the test data sneaks into training, the model gets an unfair advantage.
  • It looks like it performs very well.
  • But when you give it completely new data in the real world, performance drops.
  • So data leakage makes the model look smarter than it actually is — and that’s dangerous because it won’t work as well in real-life situations.
Example where imputation is done before splitting. 
  • Suppose you are building a model to predict house prices, and the dataset contains missing values in the feature “Lot Size.”
  • You calculate the mean lot size using the entire dataset (including both training and test data) and use that value to fill in all missing entries.
  • After performing this imputation, you split the data into training and test sets.
  • This creates data leakage because the imputed values were influenced by information from the test set.
  • As a result, the model’s evaluation may appear more accurate than it truly is, since the training process indirectly incorporated knowledge from unseen data.
 Another example of data leakage is target leakage explained here - https://www.narendranaidu.com/2026/02/ruminating-on-target-leakage-in-ml.html

Ruminating on target leakage in ML models

Target leakage is a type of data leakage where training data includes info directly tied to the outcome (target variable), but that info wouldn't exist at prediction time. Your model "cheats" during training, looks amazing on paper, but fails on new data. This often sneaks in through feature engineering or data collection, leading to overfitting. 

Examples:

  • You're building a model to spot who'll get a sinus infection. Your dataset has a feature "took_antibiotics." Sounds useful, right? Wrong—patients take antibiotics after getting sick, so this feature leaks the target. Drop it!
  • Predicting if employees will quit. Including "retention_bonus_offered" leaks info because bonuses come after quit signals, not before. The model learns from a reaction to churn, not its causes.
  • In credit card fraud prediction, using "chargeback_filed" as a feature is leakage gold. Chargebacks happen post-fraud, so the model peeks at the future.

Golden rule to avoid target leakage: Always ask: "Would this feature exist before the prediction?" If no, remove it.