Articles Catalogue
- 1 Overview
- 2 Deploy
- 2.1 Local Deployment
- 2.2 init process
- 2.2 Genee process
- 2.3 apply process
- 2.4 Successful deployment
- 2.5 Delete
- 3. Issues to be noted
- 4 Reasons for deployment failure
- appendix
Kubeflow = Kubernetes + Machine Learing + Flow
1 Overview
Kubeflow is a tool set for running machine learning tasks on K8S cluster. It provides computing frameworks for Tensorflow, Pytorch and other machine/deep learning tasks. At the same time, it builds the integration of container workflow Argo, called Pipeline. There are many problems with the latest version of local deployment. Most issue s on Github are deployment related, so if they are not deployed on GCP, they may encounter a variety of problems.
The reason for this good support for GCP is that Kubeflow is an open source version of Google's internal machine learning workflow, but few core developers have invested in it, and only a few people are doing version updates and problem fixes.
Before deploying, learn a few concepts about ksonnet.
- Regisry: ksonnet's template warehouse, either offline or online, as long as it can be accessed
- env: registry registers under Env and switches the warehouse of deployment templates through env
- Pkg: The registry contains templates, prototypes and libraries.
- prototype: This article is called a component and can configure different param s
- library: contains Api information for k8s. Different versions of k8s have different APIs
- param: Parameters for Filling Templates
- Compoonet: Template filled with parameters. This article is called component.
2 Deploy
Kubeflow's official documentation provides deployment solutions for various platforms.
https://www.kubeflow.org/docs/started/
In terms of deployment, Kubeflow takes advantage of Ksonnet, a tool that facilitates the management of K8S yaml.
Because the deployment script provided by Kubeflow only needs to be looked at with ks command when it encounters problems, it is necessary to familiarize yourself with it (I will talk about it with examples later).
2.1 Local Deployment
First of all, we need to find out what components are deployed by one-click script. We need to spend some time to understand each component. Otherwise, when we make mistakes, we can't start at all.
# ks component list COMPONENT TYPE My supplementary information may not be accurate. ========= ==== ========== ==== ==== ambassador jsonnet Kubeflow Authentication Unified Gateway and Routing application jsonnet There are too many components. This is for integration. CRD argo jsonnet Container Task Scheduling centraldashboard jsonnet Kubeflow Entrance UI jupyter jsonnet jupyter jupyter-web-app jsonnet jupyter hub katib jsonnet Components for in-depth learning parametric tuning metacontroller jsonnet It's also an internal one. CRD notebook-controller jsonnet hub You can add more than one notebook,It's also one CRD notebooks jsonnet jupyter nootbook openvino jsonnet pipeline jsonnet pipeline Integrate profiles jsonnet Components for User Rights and Authentication pytorch-operator jsonnet A Framework for Deep Learning spartakus jsonnet tensorboard jsonnet tf-job-operator jsonnet tensorflow Task CRD
The deployment of so many components is very cumbersome. The official gives a script, but the script is not enough. If there is a problem, we must read the script content and simply talk about the structure.
You have to make sure that the version is downloaded correctly, otherwise the problem of debugging is another push.
There are three folders after downloading. Focus on the script folder. The key to deployment is in two scripts. kfctl.sh/util.sh.
Because the script is too long, and all platforms of gcp/aws/minkube are mixed together, the focus is still on the ks part, because the core of the deployment is kssonet.
As for ksonset, the above image is very classic. The template is a cylinder with some missing parts, such as image in yaml. meta.name The second part is the abstraction of these parameters. The final cylinder is filled up by kssonet, and finally combined into a completed yaml file, which can be used by kubectl apply-f xxx.yaml or kssonet command KS apply-c < component >.
Grap key ks-related commands.
# Run this command to find the KS related command cat util.sh| grep "ks" # All deployments should call this function to create a common ksonnet app. # Create the ksonnet app # Initialize the directory of ks project, note that ${KS_INIT_EXTRA_ARGS} will be mentioned later eval ks init $(basename "${KUBEFLOW_KS_DIR}") --skip-default-registries ${KS_INIT_EXTRA_ARGS} # It's also very important to delete the default environment, where default proxies the registry of ks. ks env rm default # registry, which is described in the following sections with the previous ones ks registry add kubeflow "${KUBEFLOW_REPO}/kubeflow" # Starting here, you install various ks component templates, which are not very useful, and you have to generate ks pkg install kubeflow/argo ks pkg install kubeflow/pipeline ks pkg install kubeflow/common ks pkg install kubeflow/examples ks pkg install kubeflow/jupyter ks pkg install kubeflow/katib ks pkg install kubeflow/mpi-job ks pkg install kubeflow/pytorch-job ks pkg install kubeflow/seldon ks pkg install kubeflow/tf-serving ks pkg install kubeflow/openvino ks pkg install kubeflow/tensorboard ks pkg install kubeflow/tf-training ks pkg install kubeflow/metacontroller ks pkg install kubeflow/profiles ks pkg install kubeflow/application ks pkg install kubeflow/modeldb # The generate command is used to fill the parameters into the template to form the completed yaml described above. # Note that these components do not correspond to the previous template one by one, because some components contain several template parameters ks generate pytorch-operator pytorch-operator ks generate ambassador ambassador ks generate openvino openvino ks generate jupyter jupyter ks generate notebook-controller notebook-controller ks generate jupyter-web-app jupyter-web-app ks generate centraldashboard centraldashboard ks generate tf-job-operator tf-job-operator ks generate tensorboard tensorboard ks generate metacontroller metacontroller ks generate profiles profiles ks generate notebooks notebooks ks generate argo argo ks generate pipeline pipeline ks generate katib katib # cd ks_app # ks component rm spartakus # The generate command can also be parameterized ks generate spartakus spartakus --usageId=${usageId} --reportUsage=true ks generate application application
What really creates K8S resources through yaml is kfctl.sh In the script, the ks-related commands are first found in the same way.
# Run the command cat kfctl.sh| grep "ks" # Here you specify the components that the application needs to include # As mentioned above, application CAITON is a crd because kubeflow # There are too many components, so there must be a tool for unified management. ks param set application components '['$KUBEFLOW_COMPONENTS']' # # # # Here is the last key step in the script, please note!! # # # # ks show can combine components to generate yaml files ks show default -c metacontroller -c application > default.yaml # Then you can see that even such a complex Kubeflow is still built through kubectl application # So if you need to, be sure to look at the default.yaml file # The default file has a lot of content, different versions, and should be between 5000 and 9000 lines. kubectl apply --validate=false -f default.yaml
The P.S. ks command is not entirely listed. If debug is needed, you need to look at the script carefully.
2.2 init process
Through the script init.
./kfctl.sh init myapp
After init, check the version.
2.2 Genee process
# Note the catalogue cd myapp ../kfctl.sh generate all
After generation, the same Check version information is available.
2.3 apply process
# Note the catalogue ../kfctl.sh apply all
2.4 Successful deployment
Check the pod situation.
View the svc situation.
Access UI.
kubectl port-forward svc/ambassador 8003:80 -n kubeflow
Check Pipeline.
Run a Pipeline DAG.
Check tf-job-dashboard.
Submit a tf-job. There are several examples from the official component example. These components can be installed in the following way.
# Note that you need the ks_app directory generated earlier ks generate tf-job-simple-v1beta2 tf-job-simple-v1beta2 ks apply default -c tf-job-simple-v1beta2
In this way, several tasks are submitted, essentially yaml is generated through ks, and then KS application is equivalent to kubectl application.
2.5 Delete
# Note the catalogue ../kfctl.sh delete all
3. Issues to be noted
- Be sure to confirm that the Kubeflow version is a problem in downloading / installing Kubeflow, because there is a big difference between the versions before and after!
- When generating templates, you need to pay attention to the version of K8S! You can specify it in the script, see Appendix.
If you don't plan to deploy the entire Kubeflow, you can deploy only Jupyter, tf-operator, and so on.
4 Reasons for deployment failure
- If full deployment is required, multiple K8S resources need to be created and more resources are needed. Local deployment is not necessarily possible. GCP recommends 16 cores.
- Version issues, including K8S version, ksonnet version, mirror version, etc.
- Offline problem, in principle, as long as the K8S script is deployed, the image is locally available, and the deployment script has been acquired, there is no need for networked deployment.
Common problems include Github's inaccessibility, the need to download K8S's swagger.json file, and so on.
The cost of fully deploying a set of Kubeflow is too high. First, the logic of official document collation is not clear enough and the update is not timely. Second, it contains too many components. If you are not familiar with some components, it is very difficult to find out the problems. If deployed, it's best to deploy through cloud vendors. Relatively speaking, Kubeflow's deployment scripts for vendors are more active than local users. Of course, in GCP, experience should be the best.
appendix
# ks needs to read the. kube/config file # init needs to identify ks registry, and offline install swagger.json that requires k8s eval ks init $(basename "${KUBEFLOW_KS_DIR}") --skip-default-registries --api-spec=file:/tmp/swagger.json # # You can specify server to determine the version of k8s ks env add default --server=https://shmix1.k8s.so.db --api-spec=file:/tmp/swagger.json # # Note the information for each script run ++ ks env describe default + O='name: default kubernetesversion: v1.14.3 path: default destination: server: https://kubernetes.docker.internal:6443 namespace: kubeflow targets: [] libraries: {}' # # # Full deployment script # # export KUBEFLOW_VERSION=v0.5.0 curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_VERSION}/scripts/download.sh | sh cd scripts ./kfctl.sh init myapp cd myapp ../kfctl.sh generate all ../kfctl.sh apply all