KubeJobNotCompleted
KubeJobNotCompleted
Description
This alert fires when a Kubernetes Job has not completed within the expected time window.
It indicates that the Job is still running, retrying, or blocked and has not reached a successful completion state, which may delay batch processing, maintenance tasks, or dependent workflows.
Possible Causes:
- Long-running or stuck Job execution
- Job pods repeatedly restarting or retrying
- Insufficient resources (CPU, memory, ephemeral storage)
- Scheduling issues (taints, node selectors, affinity rules)
- Failing init containers or sidecars
- External dependency delays (databases, APIs, storage systems)
- Misconfigured
activeDeadlineSeconds - Node issues or pod eviction
- Image pull delays
Severity estimation
Medium severity by default, increasing with duration and criticality.
- Low if the Job is expected to run long and progress is observed
- Medium if the Job is blocking internal or periodic processes
- High if the Job is part of critical workflows (backups, migrations, billing)
- Critical if the Job blocks production operations or downstream systems
Severity increases if:
- The Job exceeds its normal runtime significantly
- Multiple Jobs are affected
- A scheduled (CronJob) execution overlaps with the next run
Troubleshooting steps
-
Check Job status
- Command / Action:
- Inspect Job completion and active pod counts
-
kubectl get job <job-name> -n <namespace>
- Expected result:
- Job shows
COMPLETIONSmet -
COMPLETIONS=1/1
- Job shows
- additional info:
ACTIVE > 0for a long time indicates the Job is not completing
- Command / Action:
-
Describe the Job
- Command / Action:
- Review Job events and progress
-
kubectl describe job <job-name> -n <namespace>
- Expected result:
- Events show normal pod execution
- additional info:
- Look for repeated retries or deadline warnings
- Command / Action:
-
Inspect Job pods
- Command / Action:
- List pods created by the Job
-
kubectl get pods -n <namespace> –selector=job-name=<job-name>
- Expected result:
- Pods are Running or Completed
- additional info:
- Pods stuck in Pending or restarting indicate issues
- Command / Action:
-
Describe running or stuck pods
- Command / Action:
- Inspect pod details and events
-
kubectl describe pod <pod-name> -n <namespace>
- Expected result:
- Pod is progressing without repeated failures
- additional info:
- Check for resource limits, scheduling, or volume mount issues
- Command / Action:
-
Check container logs
- Command / Action:
- Review logs from running or restarting containers
-
kubectl logs <pod-name> -n <namespace>
- Expected result:
- Logs show forward progress
- additional info:
- If logs are not advancing, the Job may be stuck
- Command / Action:
-
Verify Job deadlines and retries
- Command / Action:
- Review Job configuration
-
kubectl get job <job-name> -n <namespace> -o yaml
- Expected result:
activeDeadlineSecondsand retry settings are appropriate
- additional info:
- Missing or too-high deadlines can cause Jobs to run indefinitely
- Command / Action:
-
Terminate and recreate the Job if necessary
- Command / Action:
- Stop the Job and recreate it after fixing the issue
-
kubectl delete job <job-name> -n <namespace>
-
kubectl apply -f <job-manifest>.yaml
- Expected result:
- Job completes successfully
- additional info:
- Ensure the underlying cause is resolved before rerunning
- Command / Action:
Additional resources
- Kubernetes Job documentation
- Kubernetes Pod lifecycle and troubleshooting
- Related alert: KubeJobFailed