I was going through the Classification on imbalanced data by TensorFlow. Here in this tutorial they have used Kaggle's Credit Card Fraud Detection. In this section you could see that the number of training examples are 182276 and number of validation samples are 45569. To evaluate the baseline model they have used Keras's inbuilt metrics - TruePositive, FalsePositive, TrueNegative, FalseNegative.
However if you look at the training logs in train the model section then you can see that the sum of FP+TP+FN+TN is not equal to number of training examples. Nor the sum is equal to number of validation examples for validation data.
Part 1
EPOCH 1
TP = 64
FP = 25
TN = 139431.9780
FN = 188.3956
TP+FP+TN+FN = 139709.3736
The above sum is nowhere close to 182276. Same is true for all the subsequent epochs. Why is this the case?
Part 2
As the number of epoch increases, the total sum decreases further. For example compare the values for epoch 2 and 1. EPOCH 2
TP - 25
FP - 5.67
TN - 93973.1538
FN - 136.2967
TP+FP+TN+FN = 94135.1205
The total sum is now reduced further by 45574. Same is true for epochs lower down the order.
- Shouldn't the total sum be the same?
- If not then why does it keep on decreasing?
Part 3
Why are the values for TP, FP, FN, TN in both training and validation floating numbers? As per my understanding these should always be integer. As per the explanation in the Understanding useful metrics the values represent count and should hence be integers.