Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte transmission problem when using big buffer #26

Open
pallix opened this issue Nov 13, 2017 · 4 comments
Open

Byte transmission problem when using big buffer #26

pallix opened this issue Nov 13, 2017 · 4 comments

Comments

@pallix
Copy link

pallix commented Nov 13, 2017

I am transmitting a file over a serial line. I read the file like this with an Elixir task:

  alias NervesUartEvaluation.Serial

  def read(file) do
    serial = Serial.setup_serial(:readDeviceName)
    read_loop(serial, file)
  end

  def read_loop(serial, file) do
    case Nerves.UART.read(serial, 5) do
      {:ok, ""} -> read_loop(serial, file)
      {:ok, content} ->
        IO.puts(byte_size(content))
        IO.binwrite(file, content)
        read_loop(serial, file)
      fail -> IO.puts fail
    end
  end

and read the file with another task:

  @buff_size 4098

  alias NervesUartEvaluation.Serial
  alias Nerves.UART

  def write(file) do
    serial = Serial.setup_serial(:writeDeviceName)
    write_loop(serial, file)
  end

  def write_loop(serial, file) do
    case IO.binread(file, @buff_size) do
      :eof -> UART.flush(serial); IO.puts "done"
      {:error, reason} -> IO.puts("Error" <> reason)
      content ->
        IO.puts(byte_size(content))
        UART.write(serial, content)
        UART.drain(serial)
        write_loop(serial, file)
    end
  end

I have created the file with dd: dd if=/dev/urandom of=./input.bin bs=1024 count=683.
When setting buff_size to 4098 there are errors when transmitting the file:

cmp input.bin output.bin 
input.bin output.bin differ: byte 4096, line 19

whereas a value of 4000 or 2048, or 1000 works. The same byte is always the incorrectly transmitted.

Is there a bug somewhere in the library or am I doing something wrong?

@fhunleth
Copy link
Contributor

The read and write paths in the C code are coupled, so even though you have separate processes in Elixir a big write call will delay reading bytes from the OS's internal buffers. Based on your experiment, I would assume that the OS's internal buffer is 4096 bytes and when you write more than that, the OS drops the additional bytes. If the C code were actively removing bytes from the serial port while the big write was happening, this wouldn't happen.

Interestingly enough, I had a note about this coupling in the C implementation, but I had thought that it only would affect performance and since serial ports generally only operate at very slow speeds, I didn't worry about it. This is an interesting consequence of running two UARTs and looping them back on each other that I hadn't considered. I hadn't run into this use case in my own work.

This issue could certainly be fixed. It's a little tricky, though, and I don't have time at the moment to do it. I'm really glad that you pointed this out, since I bet others may run into it and it feels more legit now to spend time decoupling the read and write paths in the C code.

@pallix
Copy link
Author

pallix commented Nov 14, 2017

Thank you very much for your detail answer.

Is there one buffer per physical device? or one per device in /dev? I am writing on one device in /dev and reading on another. The physical is binding two ports of the machine.

Does that mean that if I run the process on two machines I will not have the problem?

@fhunleth
Copy link
Contributor

Hmm. Now I'm less sure. I was thinking that you were reading and writing to one device and the receive wire was connected to the transmit wire. If you have two nerves_uart GenServers running for two different devices then that messes up my theory that the problem was in the nerves_uart C port implementation.

Are you running on Linux?

Also, have you tried removing the call to UART.drain and just letting UART.write block when it has to? I have a vague recollection of a serial driver where that call had a side effect of coupling the rx and tx paths, but that was a long time ago and certainly serial device-specific.

As for two machines, I would absolutely hope that you wouldn't see this problem on two machines. If you did, then that pretty squarely points to the transmit side having a limit of sending 4K at a time. I don't see how that could be nerves_uart, but I guess that it would be something to investigate.

@pallix
Copy link
Author

pallix commented Nov 14, 2017

I am running on Linux. Removing the call to drain causes the data to be corrupted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants