Skip to content

Instantly share code, notes, and snippets.

@chenkovsky
Last active January 31, 2025 12:02
Show Gist options
  • Select an option

  • Save chenkovsky/1c74b7eaea9f8db77476f1a97dd4f696 to your computer and use it in GitHub Desktop.

Select an option

Save chenkovsky/1c74b7eaea9f8db77476f1a97dd4f696 to your computer and use it in GitHub Desktop.

Celeborn 本地调试

在开发 celeborn 的时候,往往需要打包部署验证。这样的流程比较长。而如果只是使用 celeborn 内部自带的单测,又很难模拟真实环境。 所以在此分享一下本机调试分布式系统的经验。

Docker 环境准备

笔者使用 Macbook, 所以安装了 Colima. 其他系统只要安装了 docker 环境即可。

首先启动 docker 环境。celeborn 对于内存还是有一定要求的,所以设置 cpu 和 memory。

colima start --cpu 6 --memory 12

部署 Kind 集群

首先安装Kind

在本地新建一个 kind 集群的配置文件。将下面的内容复制进去。将 workspace 替换成 celeborn 代码所在的目录。

apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
  - role: control-plane
  - role: worker
    extraMounts:
    - hostPath: {workspace}/celeborn/dist
      containerPath: /opt/celeborn
  - role: worker
    extraMounts:
    - hostPath: {workspace}/celeborn/dist
      containerPath: /opt/celeborn
  - role: worker
    extraMounts:
    - hostPath: {workspace}/celeborn/dist
      containerPath: /opt/celeborn
  - role: worker
    extraMounts:
    - hostPath: {workspace}/celeborn/dist
      containerPath: /opt/celeborn

用上面的配置创建 kind 集群

kind create cluster --config config.yaml

第一次使用,我们可能会需要打包镜像,并将 celeborn 和 alpine 的镜像上传,

kind load docker-image celeborn
kind load docker-image alpine:3.18

修改 Celeborn Helm chart 的定义

修改定义的原因是,我希望将 celeborn 编译结果所在的 dist 目录挂载到 container 中。一旦 maven 编译完成,重启 pod 就能实时更新,无需再打包镜像,重新部署,从而加快开发流程。

diff --git a/charts/celeborn/templates/master/statefulset.yaml b/charts/celeborn/templates/master/statefulset.yaml
index 7d3fe6e..0f9c38b 100644
--- a/charts/celeborn/templates/master/statefulset.yaml
+++ b/charts/celeborn/templates/master/statefulset.yaml
@@ -102,6 +102,10 @@ spec:
         - name: {{ $.Release.Name }}-master-vol-{{ $index }}
           mountPath: {{ .mountPath }}
         {{- end }}
+        {{- range $index, $volume := .Values.volumes.hot_loads }}
+        - name: {{ $.Release.Name }}-master-hot-load-vol-{{ $index }}
+          mountPath: {{ .mountPath }}
+        {{- end }}
         {{- with .Values.resources.master }}
         resources:
           {{- toYaml . | nindent 10 }}
@@ -127,6 +131,12 @@ spec:
       {{ fail "For now Celeborn Helm only support emptyDir or hostPath volume types" }}
       {{- end }}
       {{- end }}
+      {{- range $index, $volume := .Values.volumes.hot_loads }}
+      - name: {{ $.Release.Name }}-master-hot-load-vol-{{ $index }}
+        hostPath:
+          path: {{ $volume.hostPath | default $volume.mountPath }}
+          type: DirectoryOrCreate
+      {{- end }}
       {{- with .Values.nodeSelector }}
       nodeSelector:
         {{- toYaml . | nindent 8 }}
diff --git a/charts/celeborn/templates/worker/statefulset.yaml b/charts/celeborn/templates/worker/statefulset.yaml
index f8d1023..a3290ef 100644
--- a/charts/celeborn/templates/worker/statefulset.yaml
+++ b/charts/celeborn/templates/worker/statefulset.yaml
@@ -105,6 +105,10 @@ spec:
         - name: {{ $.Release.Name }}-worker-vol-{{ $index }}
           mountPath: {{ .mountPath }}
         {{- end }}
+        {{- range $index, $volume := .Values.volumes.hot_loads }}
+        - name: {{ $.Release.Name }}-worker-hot-load-vol-{{ $index }}
+          mountPath: {{ .mountPath }}
+        {{- end }}
         {{- with .Values.resources.worker }}
         resources:
           {{- toYaml . | nindent 10 }}
@@ -130,6 +134,12 @@ spec:
       {{ fail "Currently, Celeborn chart only supports 'emptyDir' and 'hostPath' volume types" }}
       {{- end }}
       {{- end }}
+      {{- range $index, $volume := .Values.volumes.hot_loads }}
+      - name: {{ $.Release.Name }}-worker-hot-load-vol-{{ $index }}
+        hostPath:
+          path: {{ $volume.hostPath | default $volume.mountPath }}
+          type: DirectoryOrCreate
+      {{- end }}
       {{- with .Values.nodeSelector }}
       nodeSelector:
         {{- toYaml . | nindent 8 }}
diff --git a/charts/celeborn/values.yaml b/charts/celeborn/values.yaml
index 37c6962..9459859 100644
--- a/charts/celeborn/values.yaml
+++ b/charts/celeborn/values.yaml
@@ -28,11 +28,11 @@ fullnameOverride: ""
 # Specifies the Celeborn image to use
 image:
   # -- Image repository
-  repository: aliyunemr/remote-shuffle-service
+  repository: celeborn
   # -- Image tag
-  tag: 0.1.1-6badd20
+  tag: latest
   # -- Image pull policy
-  pullPolicy: Always
+  pullPolicy: IfNotPresent
   # -- Image name for init containter. (your-private-repo/alpine:3.18)
   initContainerImage: alpine:3.18
 
@@ -78,6 +78,16 @@ volumes:
       hostPath: /mnt/celeborn_ratis
       type: hostPath
       capacity: 100Gi
+  hot_loads:
+    - mountPath: /opt/celeborn/master-jars
+      hostPath: /opt/celeborn/master-jars
+      type: hostPath
+    - mountPath: /opt/celeborn/worker-jars
+      hostPath: /opt/celeborn/worker-jars
+      type: hostPath
+    - mountPath: /opt/celeborn/jars
+      hostPath: /opt/celeborn/jars
+      type: hostPath
   # -- Specifies volumes for Celeborn worker pods
   worker:
     - mountPath: /mnt/disk1

在 kind 集群中安装修改过后的 Celeborn

helm upgrade --install celeborn . --namespace celeborn -f values.yaml --create-namespace

调试

进入 pod 查看运行状况。

kubectl exec -it celeborn-master-0 -n celeborn -- bash

代码更新。

编译完,只需要删除 pod, 重启的pod就是使用新的代码了。

kubectl delete po celeborn-worker-0 -n celeborn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment