An On-Premises, Streaming Speech Recognition System
|React Web App||ESP32-LyraT|
docker run -it -p 8080:8080 iceychris/libreasr:latest
The output looks like this:
make sde & make sen & make b make: Entering directory '/workspace' python3 -u api-server.py de make: Entering directory '/workspace' python3 -u api-server.py en make: Entering directory '/workspace' python3 -u api-bridge.py [api-bridge] running on :8080 LM: loaded. LM: loaded. Model and Pipeline set up. [api-server] gRPC server running on [::]:50051 language en Model and Pipeline set up. [api-server] gRPC server running on [::]:50052 language de
If it doesn't look like that this issue might help.
Head your browser to http://localhost:8080/
Tuned language model fusion
|Model||Dataset||Network||Params||CER (dev)||WER (dev)|
While this is clearly not SotA, training the models for longer and on multiple GPUs (instead of a single
2080 ti) would yield better results.
See releases for pretrained models.
edit create-asr-dataset.py if you use a custom dataset
process each of your datasets using create-asr-dataset.py, e.g.:
python3 create-asr-dataset.py /data/common-voice-english common-voice --lang en --workers 4
This results in multiple
asr-dataset.csv files, which will be used for training.
edit the configuration testing.yaml to point to your data, choose transforms and tweak other settings
adjust and run libreasr.ipynb to start training
watch the training progress in tensorboard
the model with the best validation loss will get saved to
models/model.pth, the model with the best WER ends up in
You may also contribute by training a large model for longer.