Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket transport #81

Merged
merged 3 commits into from
Mar 25, 2024
Merged

Websocket transport #81

merged 3 commits into from
Mar 25, 2024

Conversation

Moishe
Copy link
Contributor

@Moishe Moishe commented Mar 20, 2024

This PR introduces a WebSocketTransport class, and has a sample that implements an admittedly janky Javascript client for this transport. The sample code uses Whisper to do transcription, and it seems to work, albeit slowly.

Some notes:

  • The websocket transport only supports one client at a time. We could have an array of _websockets instead of a single _websocket and do fan-out but that feels like asking for trouble and if this is meant to be a jumping-off point, I think that could be the kind of functionality we assume people will want to do on their own.
  • I implemented an audio frame aggregator to chunk up audio frames before sending them to the javascript client because I get weird clicking when playing the frames. I'm sure there's a way to not need this and stream more cleanly, but I mainly wanted to verify that I could send protobufs over the wire and decode them to audio.
  • As part of the audio frame aggregator, I added TTSStartFrame and TTSEndFrame frames that are emitted by the base TTS service. These are useful for aggregation (to know when to send a partial buffer) but we could also used them for half-duplex, to mute incoming audio frames when a TTSResponse is being played.

Some to-dos/not yet done:

  • in a separate PR, I'd like to move the transports to their own directory instead of mixing them with services.
  • I'd like to make a purely abstract transport class that defines the bare minimum functionality, and make the current BaseTransportService class a subclass of that, maybe called a "ThreadingTransportClass". Using a purely asyncio class obviated all the threading complexity that we use for the native and daily transport classes (as an aside this made me pine for a pure asyncio version of daily-python)
  • not everything in BaseTransport is implemented in the WebsocketTransport (most importantly, the conversation methods). I'll do that in a separate PR but didn't want to block this PR on that.
  • I copied the frame.proto file so it could be used by the Javascript sample code. This isn't ideal.
  • Relatedly there's no formalized method of generating the _pb2.py file from the frames.proto file
  • there's no error handling in the websocket transport yet

@Moishe Moishe force-pushed the websocket-transport branch from 336b8fc to 94db395 Compare March 21, 2024 12:08
@aconchillo aconchillo force-pushed the websocket-transport branch 2 times, most recently from e393eaa to 93e62c4 Compare March 21, 2024 18:05
@Moishe Moishe force-pushed the websocket-transport branch 3 times, most recently from 4405f7c to 695d060 Compare March 24, 2024 16:54
@Moishe Moishe marked this pull request as ready for review March 24, 2024 17:04

for (let i = 0; i < frameCount; i++) {
const word = view[i];
nowBuffering[i] = ((word + 32768) % 65536 - 32768) / 32768.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems nowBuffering is not being used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nm, you are modifying the contents.

self._stop_server_event.set()
break
if self._websocket and frame:
proto = frame.to_proto().SerializeToString()
Copy link
Contributor

@aconchillo aconchillo Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we would then do

proto = self._serializer.serialize(frame)

and we would have:

self._serializer = ProtobufFrameSerializer()

self._websocket = websocket
async for message in websocket:
generic_frame = frame_protos.Frame.FromString(message)
frame = FrameFromProto(generic_frame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the two lines above we would have:

frame = self._serializer.deserialize(message)

@aconchillo
Copy link
Contributor

aconchillo commented Mar 25, 2024

I will review more tomorrow but I'm wondering if we should have a serializer classe instead. Something like:

class FrameSerializer:

   def serialize(data):
       ...

   def deserialize() -> Frame:
       ...

and then:

class ProtobufAudioFrameSerializer(FrameSerializer):

   def serialize(data):
       ...

   def deserialize(proto) -> AudioFrame:
       return AudioFrame(data=proto.audio.audio)


class ProtobufFrameSerializer(FrameSerializer):

    def __init__(self):
        self._audio_serialize = ProtobufAudioFramSerializer()
        .....

   def serialize(data):
        ...

   def deserialize(proto) -> Frame:
        if proto.WhichOneof("frame") == "audio":
            return self._audio_serializer.deserialize(proto)
        ...

then we could even pass that to the transport:

class WebsocketTransport:

    def __init__(self, serializer = ProtobufFrameSerializer()):
       ....

but we should be able to call it with:

transport = WebsocketTransport(serializer = JsonFrameSerializer())

for example.

Or something like this.

@aconchillo
Copy link
Contributor

This looks very good. Just adding talking out loud and adding some first thoughts. 🙌 .

@Moishe
Copy link
Contributor Author

Moishe commented Mar 25, 2024

I like the Serializer idea, will play with that this morning.

@Moishe
Copy link
Contributor Author

Moishe commented Mar 25, 2024

Okay, added the serializer class. I can't tell if I went too far with the abstraction, but it now dynamically discovers the fields and transfers them.

This reduces the code, overall, and requires the protobuf field names to match the Frame field names. I think this is okay (or even good), but it made me realize that I could also dynamically create the .proto file from the Frames.py file. I didn't go that far.

@aconchillo
Copy link
Contributor

Okay, added the serializer class. I can't tell if I went too far with the abstraction, but it now dynamically discovers the fields and transfers them.

This reduces the code, overall, and requires the protobuf field names to match the Frame field names. I think this is okay (or even good), but it made me realize that I could also dynamically create the .proto file from the Frames.py file. I didn't go that far.

Nice! Yes, perfect, no need for additional ProtobufAudioFrameSerializer then.

I'll take a final look in a bit, but I think this looks great! Thank you for making the changes!

args: --exit-code -r -d -a -a src/
run: |
source .venv/bin/activate
autopep8 --max-line-length 100 --exit-code -r -d --exclude "*_pb2.py" -a -a src/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahahaha. OK.... but what will we do in 2040?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by then my vision will be bad enough that I'll lobby we go back to 80 columns 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... we should update the README with this setting.

self._n_channels = kwargs.get("n_channels", 1)
self._port = kwargs.get("port", 8765)
self._host = kwargs.get("host", "localhost")
self._audio_frame_size = kwargs.get("audio_frame_size", 16000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 16000 seems to match the value from WebSocketFrameProcessor, maybe a constant for both would be nice.

TextFrame, AudioFrame, TTSEndFrame, TTSStartFrame])
self._serializer = kwargs.get("serializer", ProtobufFrameSerializer())

if self._camera_enabled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is _camera_enabled defined?

raise ValueError(
"Camera is not supported in WebsocketTransportService")

if self._speaker_enabled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems _speaker_buffer_pending is not used. so maybe these too blocks are not needed? seems a transport should need to care about speaker or camera.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, this comes from the base class....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i believe the comment about transport not caring about cameras or microphone still applies... This might be related to your other comment of separating transport from services. I think it makes total sense.

@aconchillo
Copy link
Contributor

LGTM! Just added a comment about a constant and some question/comment that I believe you were also suggesting.

@@ -152,6 +152,7 @@ Install the
},
"autopep8.args": [
"-a",
"-a"
"-a",
"--max-line-length=100"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the = work? If so, use that on Emacs as well.

@Moishe Moishe force-pushed the websocket-transport branch from d9a6af0 to 2bda4c3 Compare March 25, 2024 17:54
@Moishe Moishe merged commit 24fb7c5 into main Mar 25, 2024
2 checks passed
@Moishe Moishe deleted the websocket-transport branch March 29, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants