In neural nets, we have to specify the number of epochs while we train the model.

One **Epoch **is defined as the complete forward & backward pass of the neural net over the complete training dataset. We need to remember that Gradient Descent is an iterative process and hence we need multiple passes (or epochs) to find the optimal solution. In each epoch, the weights and biases are updated.

**Batch size** is the number of records in one batch. One Epoch may consist of multiple batches. The training dataset is split into batches because of memory space requirements. A large dataset cannot be fit into memory all at once. With increase in Batch size, required memory space increases.

**Iterations **is the number of batches needed to complete one epoch. So if we have 2000 records and a batch size of 500, then we will need 4 iterations to complete one epoch.

If the number of epochs are low, then it results in underfitting the data. As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve. *Then how do we determine the optimal number of epochs?*

It is done by splitting the training data into 2 subsets - a) 80% for training the neural net b) 20% for validating the model after each epoch. The fundamental reason we split the dataset into a validation set is to prevent our model from overfitting. The model is trained on the training set, and, simultaneously, the model evaluation is performed on the validation set after every epoch.

Then we use something called as the "early stopping method"- we essentially keep training the neural net for an arbitrary number of epochs and monitor the performance on the validation dataset after each epoch. When there is no sign of performance improvement on your validation dataset, you should stop training your network. This helps us arrive at the optimal number of epochs.

A good explanation of these concepts is available here - https://www.v7labs.com/blog/train-validation-test-set. Found the below illustration really useful to understand the split of data between train/validate/test sets.