Real-Time Semantic Segmentation in Mobile device
This project is an example project of semantic segmentation for mobile real-time app.
LFW, Labeled Faces in the Wild, is used as a Dataset.
The goal of this project is to detect hair segments with reasonable accuracy and speed in mobile device. Currently, it achieves 0.89 IoU.
About speed vs accuracy, more details are available at my post.
- Android (TODO)
- PyTorch 0.4
- CoreML for iOS app.
At this time, there is only one model in this repository, MobileUNet.py. As a typical U-Net architecture, it has encoder and decoder parts, which consist of depthwise conv blocks proposed by MobileNets.
Input image is encoded to 1/32 size, and then decoded to 1/2. Finally, it scores the results and make it to original size.
Steps to training
Data is available at LFW. To get mask images, refer issue #11 for more. After you got images and masks, put the images of faces and masks as shown below.
data/ raw/ images/ 0001.jpg 0002.jpg masks/ 0001.ppm 0002.ppm
If you use 224 x 224 as input size, pre-trained weight of MobileNetV2 is available. Download it from A PyTorch implementation of MobileNetV2 and put weight file under
python train_unet.py \ --img_size=224 \ --pre_trained='weights/mobilenet_v2.pth.tar'
If you use other input sizes, the model will be trained from scratch.
python train_unet.py --img_size=192
Dice coefficient is used as a loss function.
As the purpose of this project is to make model run in mobile device, this repository contains some scripts to convert models for iOS and Android.
- It converts trained PyTorch model into CoreML model for iOS app.
- Report speed vs accuracy in mobile device.
- Convert pytorch to Android using TesorFlow Light