Sherpa ONNX - Speech Models

📡 API Endpoints

POST /api/v1/speech/stt - Speech-to-Text

curl -X POST [API_URL]/api/v1/speech/stt \
  -H "X-API-Key: your-key" \
  -F "file=@audio.wav"
                    

POST /api/v1/speech/diarization - Speaker Diarization

curl -X POST [API_URL]/api/v1/speech/diarization \
  -H "X-API-Key: your-key" \
  -F "file=@audio.wav"
                    

🎤 Upload Sherpa ONNX STT Models

Upload Speech-to-Text models (.onnx, .txt files) to app/models/sherpa-onnx/

⚠️ File Name Requirements:

tokens.txt - Vocabulary/token mappings (exact name required)
encoder.onnx - Encoder model (exact name required)
decoder.onnx - Decoder model (exact name required)
joiner.onnx - Joiner model (exact name required)

All 4 files must be uploaded with these exact names. The system will not work if files have different names!

📤

Click to select .onnx or .txt files or drag and drop

Supports: .onnx, .txt files (tokens.txt, encoder.onnx, decoder.onnx, joiner.onnx)

👥 Upload Sherpa ONNX Diarization Models

Upload Speaker Diarization models (.onnx files) to app/models/sherpa-onnx/diarization/

⚠️ File Name Requirements:

Option 1 (Recommended - Two files):

encoder.onnx - Encoder model (exact name required)
decoder.onnx - Decoder model (exact name required)

Option 2 (Single file - experimental):

model.onnx - Single model file (will be used for both encoder and decoder)

Note: Standard Sherpa ONNX diarization requires two separate models (segmentation and embedding). If you only have one file, you can upload it as model.onnx to try the single-file option, but for best results, download both encoder.onnx and decoder.onnx from Sherpa ONNX releases.

📤

Click to select .onnx files or drag and drop

Supports: .onnx files (encoder.onnx, decoder.onnx)