I have a job that can take up to several hours. It is possible that for some reason (like out of memory, or cluster rebalance) it just fails. The problem is that the job is usually run overnight, and someone needs to check on it in the morning, and manually restart it (which most of the time is enough). I was wondering if this problem can be solved using spring cloud data flow.
Ideally, I would want SCDF to send an email (or call a webhook) when a job is done (failed or success), and retry an entire job if it fails. Is it possible to do that?
question from:https://stackoverflow.com/questions/66064252/automatic-job-restart