Batch-scheduler
Background
Currently, though the default scheduler in kubernetes cannot ensure a group of pods can be scheduled, it would schedule the pods. Under some scene, it would waste resources since some pods need work together, like spark
, tensorflow
and so on. So, batch-scheduler is aimed at solving the issue.
Method
Features
- lightweight
- no resource race
- gang scheduling
Implementation
Based on the latest scheduling framework, we designed the scheduler. So only one scheduler is needed to run in the cluster, which makes sure resource race
would not happen.
This scheduler also makes sure gang scheduling, e.g.
- scene1
A group consists of 5 pods. The batch-scheduler would not schedule any pod until enough resources are found.
- scene2 Only 6 cpu exist in the cluster. Two groups require 5 cpus and 5 cpus are submitted, then only one and at least one group would be scheduler
How to keep light weight. We named a CRD PodGroup. When we would to running a group of pods, just need submit a PodGroup
, e.g. group1
into the cluster. The pods needs to run as a group should only add a label group.batch .scheduler.tencent.com: group1
Main Progress
-
PreFilter
: Compute resource requirements before we start predicts for a pod. If a pod is not permitted, we add it to freeze cache, then the pods belong to the same group would be rejected directly. -
Less
: this interface decides the sequence of pods. Currently, pods having higherPriority
would be scheduled first. If pods have same the priority, PodGroup Creation time would be compared. -
Permit
: it is used for approving a list of pods. -
It is better to set MaxScheduleTime for a PodGroup. If one of the pods belong to the same PodGroup times out, the others pods would also be reject.
Build
git clone [email protected]/tenstack/batch-scheduler.git
make build
Deploy
- Deploy CRD
# cd deploy
# kubectl apply -f deploy crd.yaml
- Configuration
Default config has been written, but kube_config
in it should be changed to your self stored.
- Deploy batch-scheduler
# cd deploy
# bash start.sh
Example
This example show the resource race scene. Only 8 cpu exist in the cluster, and 0.9 has been occupied.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 900m (11%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
- Yaml file named
sts-group-valid-race.yaml
is as follow
apiVersion: batch.scheduler.tencent.com/v1
kind: PodGroup
metadata:
name: group1
namespace: default
spec:
minMember: 5
---
apiVersion: batch.scheduler.tencent.com/v1
kind: PodGroup
metadata:
name: group2
namespace: default
spec:
minMember: 5
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-group-race1
spec:
selector:
matchLabels:
app: nginx
podManagementPolicy: Parallel
serviceName: "nginx"
replicas: 5
template:
metadata:
labels:
group.batch.scheduler.tencent.com: "group1"
app: nginx
type: node
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
resources:
limits:
cpu: "1"
requests:
cpu: "1"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web-group-race2
spec:
selector:
matchLabels:
app: nginx
podManagementPolicy: Parallel
serviceName: "nginx"
replicas: 5
template:
metadata:
labels:
group.batch.scheduler.tencent.com: "group2"
app: nginx
type: node
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: web
resources:
limits:
cpu: "1"
requests:
cpu: "1"
- Submit it
# kubectl apply -f sts-group-valid-race.yaml
- Result
[[email protected] ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
web-group-race1-0 0/1 ContainerCreating 0 16s
web-group-race1-1 0/1 ContainerCreating 0 16s
web-group-race1-2 0/1 ContainerCreating 0 16s
web-group-race1-3 1/1 Running 0 16s
web-group-race1-4 0/1 ContainerCreating 0 16s
web-group-race2-0 0/1 Pending 0 16s
web-group-race2-1 0/1 Pending 0 16s
web-group-race2-2 0/1 Pending 0 16s
web-group-race2-3 0/1 Pending 0 16s
web-group-race2-4 0/1 Pending 0 16s