The AclNet
model is designed to perform sound classification.
The AclNet
model is trained on an internal dataset of environmental sounds.
For details about the model, see this paper.
The model input is a segment of PCM audio samples in [N, C, 1, L] format.
The model output for AclNet
is the sound classifier output for the 53 different environmental sound classes from the internal sound database.
Metric | Value |
---|---|
Type | Classification |
GFLOPs | 1.4 |
MParams | 2.7 |
Source framework | PyTorch* |
See this publication and this paper.
Audio, name - 0
, shape - 1,1,1,L
, format is N,C,1,L
where:
N
- batch sizeC
- channelL
- number of PCM samples (minimum value is 16000)
Audio, name - 0
, shape - 1,1,1,L
, format is N,C,1,L
where:
N
- batch sizeC
- channelL
- number of PCM samples (minimum value is 16000)
Sound classifier (see labels), name - 203
, shape - 1,53
, output data format is N,C
where:
N
- batch sizeC
- Predicted softmax scores for each class in [0, 1] range
Sound classifier (see labels), name - 203
, shape - 1,53
, output data format is N,C
where:
N
- batch sizeC
- Predicted softmax scores for each class in [0, 1] range
The original model is distributed under Apache License, Version 2.0.