Please refer to the official docs at kubeflow.org.
Quick Links
Get Involved
Please refer to the Community page.
Category: Linux / Miscellaneous |
Watchers: 369 |
Star: 12.2k |
Fork: 2.1k |
Last update: Feb 1, 2023 |
Please refer to the official docs at kubeflow.org.
Please refer to the Community page.
Implements case 1 and case 2 in the new kfctl semantics Design doc reference
Current workflow for kfctl -
New sematics single command install:
# move into empty directory
kfctl apply -f <path to config file / URL>
New semantics build, edit, apply:
# move into empty directory
kfctl build -f <path to config file / URL>
kfctl apply
Old semantics:
kfctl init kf-app --config <path to config file / URL>
cd kf-app
kfctl generate <platform>
kfctl apply <platform>
/cc @yanniszark @jlewi @kkasravi
Related to: #3518
Trying to install Kubeflow v.70 on on-prem cluster (kubernetes v15.7/Ubuntu 18.04) but getting below error:
[email protected]:/opt/kubeflow/kf-test# kfctl apply -V -f ${CONFIG_FILE} INFO[0000]
Notice anonymous usage reporting enabled using spartakus To disable it If you have already deployed it run the following commands: cd $(pwd) kubectl -n ${K8S_NAMESPACE} delete deploy -l app=spartakus
For more info: https://www.kubeflow.org/docs/other-guides/usage-reporting/
filename="coordinator/coordinator.go:120" INFO[0000] Deleting cachedir /opt/kubeflow/kf-test/.cache/manifests because Status.ReposCache is out of date filename="kfconfig/types.go:464" INFO[0000] Fetching https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz to /opt/kubeflow/kf-test/.cache/manifests filename="kfconfig/types.go:485" Error: failed to build kfApp from URI /opt/kubeflow/kf-test/kfctl_existing_arrikto.yaml: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz Error Error opening a gzip reader for /tmp/getter093015857/archive: EOF Usage: kfctl apply -f ${CONFIG} [flags]
Flags: -f, --file string Static config file to use. Can be either a local path: export CONFIG=./kfctl_gcp_iap.yaml or a URL: export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_gcp_iap.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_existing_arrikto.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_aws.0.7.0.yaml export CONFIG=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_k8s_istio.0.7.0.yaml kfctl apply -V --file=${CONFIG} -h, --help help for apply -V, --verbose verbose output default is false
failed to build kfApp from URI /opt/kubeflow/kf-test/kfctl_existing_arrikto.yaml: couldn't generate KfApp: (kubeflow.error): Code 500 with message: could not sync cache. Error: (kubeflow.error): Code 400 with message: couldn't download URI https://github.com/kubeflow/manifests/archive/v0.7-branch.tar.gz Error Error opening a gzip reader for /tmp/getter093015857/archive: EOF
The Kubeflow installation was working fin when tried it a few days ago. Trying to install it in a new cluster and getting this error. Can you please help?
Thanks, Job Varkey
/kind process
We need to identify who will be driving the 1.1 release. These folks should then
This would probably be a good opportunity to update some of the processes and policies around releases. https://github.com/kubeflow/kubeflow/blob/master/docs_dev/releasing.md
Area | release czar | Tracking Issue --- | --- | --- aws | @Jeffwan | ~~#5057~~ | centraldashboard | | ~~#5068~~ docs | | kubeflow/website#1984 fairing | @jinchihe | ~~kubeflow/fairing#503~~ | feast | @woop | | gcp | @jlewi | kubeflow/gcp-blueprints#46 | katib | @andreyvelich | kubeflow/katib#1211 | kfctl | @krishnadurai, @crobby | kubeflow/kfctl#352 | kfserving | @yuzisun @animeshsingh | kubeflow/kfserving#648 | manifests | @krishnadurai | kubeflow/manifests#1252 metadata | | minikf | @vkoukis | | multiuser | @yanniszark @bmorphism | #5067, #5068 notebooks | @kimwnasptd , @jtfogarty | #5060, #5068 | pipelines | @Bobgy | kubeflow/pipelines#3961 | training | @johnugeorge @andreyvelich @Jeffwan | kubeflow/common#97
Target Dates
We'd like to make it super easy to go from writing code in a notebook to training that model distributed.
Experience might be something like
Under the hood this would cause
I think the biggest challenge is that we probably don't want to execute all code in the notebook. Typically, there's some amount of refactoring that needs to be done to convert a notebook into a python module suitable for execution in a bash job.
As a concrete example
Here's the notebook for our GitHub Issue summarization example
Here's the corresponding python module used when training in a K8s job.
The python module only executes a subset of cells in particular those to
Rather than try to auto-convert a notebook like the github issue example, I think we should require users structure their code to facilitate the conversion.
My suggestion would be to allow any functions defined in the notebook to be used as entry points. So for the GitHub issues summarization the user would have a cell like the following
from keras.callbacks import CSVLogger, ModelCheckpoint
def train_model(output)
script_name_base = 'tutorial_seq2seq'
csv_logger = CSVLogger('{:}.log'.format(script_name_base))
model_checkpoint = ModelCheckpoint('{:}.epoch{{epoch:02d}}-
val{{val_loss:.5f}}.hdf5'.format(script_name_base),
save_best_only=True)
batch_size = 1200
epochs = 7
history = seq2seq_Model.fit([encoder_input_data, decoder_input_data],
np.expand_dims(decoder_target_data, -1),
batch_size=batch_size,
epochs=epochs,
validation_split=0.12, callbacks=[csv_logger, model_checkpoint])
seq2seq_Model.save(output)
train('seq2seq_model_tutorial.h5')
If user structures their code this way, we should be able to manually create and invoke a suitable container entry point. Something like the following
A variant of this idea would be to use metaml (by @wbuchwalter ). metaml uses metaparticle to allow people to annotate their python code with information needed to then run it on K8s (e.g. distributed using TFJob). If we went this approach I think the flow would be
@willingc @yuvipanda Is there existing tooling in the Jupyter community other than nbconvert to convert notebooks to code suitable for asynchronous batch execution?
/cc @wbuchwalter @gaocegege @yuvipanda @willingc
We should have a Kubeflow logo.
/kind bug
We just merged for KF 1.7 RC0 the update on AuthService https://github.com/kubeflow/manifests/pull/2150. But in this commit the logout URL is /authservice/logout
, instead of /logout
. But since we can't hardcode this value, since not everyone might be using AuthService, we'll need to have a dynamic mechanism for learning this URL.
I'd propose to
LOGOUT_URL
in the Deployment of the app/api/workgroup/env-info
route handler https://github.com/kubeflow/kubeflow/blob/master/components/centraldashboard/app/api_workgroup.tsThis way users can configure the app based on the authentication service that they use in a dynamic way
Bumps http-cache-semantics from 4.1.0 to 4.1.1.
2449650
Update mocha560b2d8
Don't use regex to trim whitespaceb1bdb92
Remove linting package zooc20dc7e
Cache 308Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and languageYou can disable automated security fix PRs for this repo from the Security Alerts page.
Bumps http-cache-semantics from 4.1.0 to 4.1.1.
2449650
Update mocha560b2d8
Don't use regex to trim whitespaceb1bdb92
Remove linting package zooc20dc7e
Cache 308Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and languageYou can disable automated security fix PRs for this repo from the Security Alerts page.
Bumps http-cache-semantics from 4.1.0 to 4.1.1.
2449650
Update mocha560b2d8
Don't use regex to trim whitespaceb1bdb92
Remove linting package zooc20dc7e
Cache 308Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and languageYou can disable automated security fix PRs for this repo from the Security Alerts page.
/kind question
We are running Multi-tenant Kubeflow 1.6.1. in our cluster and we would like to change the StorageClass used by the PVCs created by the pipelines and the notebooks depending on the team. We also would like to avoid other teams to select the StorageClasses that belong to the other teams.
Is there any way to get this behaviour with configuration?
This PR addresses https://github.com/kubeflow/kubeflow/pull/6916#issuecomment-1408865920 and is part of https://github.com/kubeflow/kubeflow/issues/6766.
Right now, the KF integration tests do not work when we update the manifests for RCs. The sed
command does not work since it tries to replace the latest
tag, which doesn't exist in the release branches.
Changes:
kustomize edit image <curr_img>=<new_img>:<tag>
command (https://github.com/kubernetes-sigs/kustomize/blob/master/examples/image.md) to set the appropriate tag in the manifests before they are applied.Signed-off-by: Apostolos Gerakaris [email protected]
KF 1.6 release 🎉
KF 1.5 release 🎉
First RC of the KF 1.5 release 🎉
First RC of the KF 1.5 release 🎉
Web apps:
Internationalization progress:
Central Dashboard:
Jupyter web app
Notebooks
TensorBoards
PodDefaults
Other improvements
The first RC for the 1.4 release