This repo is the official code base for PVLDB paper "Efficient task-specific data valuation for nearest neighbor algorithms".
It contains scripts to calculate exact Shapley value (in the
exact_sp.py) and approximate Shapley value based on LSH (in the
LSH_sp.py) for KNN classifier.
We also provide two examples about how to calculate exact Shapley value (in the
exact_sp_example.py) and approximate Shapley value (in the
LSH_sp_example.py) on Cifar-10 dataset.
In the reproduction folder, we provide our jupyter notebook scripts for tree datasets (Cifar-10, ImageNet, and YFCC100M), which recorded our experiment results, to help reproduce our experiments.
If you have any questions about our code, please do not hesitate to ask in the issues. Thanks!