Skip to content
This repository has been archived by the owner on Dec 7, 2020. It is now read-only.

Gatekeeper prevents streaming output #645

Open
ghost opened this issue Jun 16, 2020 · 4 comments
Open

Gatekeeper prevents streaming output #645

ghost opened this issue Jun 16, 2020 · 4 comments

Comments

@ghost
Copy link

ghost commented Jun 16, 2020

Gatekeeper prevents streaming output

Summary

I have a PHP based website that has a feature that sends already partly output to the requester. Below you can find a simple php example for this logic.

<?php
echo "starting script ... you should see output every 1 second<br>\n";

for ($i=1; $i<=50; $i++) {
    echo "index: $i; ob_level: " . ob_get_level() . "; ob_length: ". ob_get_length() . "<br>\n";

    if ($i % 10 === 0) {
        flush();
        ob_flush();
        sleep(1);
    }
}     

When using Gatekeeper before this website with this feature, only the complete output is displayed at the requester and not already the parts of the output.

Environment

  • I'm using the docker image: quay.io/keycloak/keycloak-gatekeeper:latest
  • Louketo: 10.0.0

Expected Results

I would like to have a way to configure gatekeeper to return already parts of the output of a website instead of the complete output.

Actual Results

see summary

Steps to reproduce

  • Let a webserver e.g. apache serve the above-mentioned PHP test script.
  • Configure Gatekeeper to protect the test script
  • Open a browser and
    • call the test script directly at the apache, you should see that every second there will be new lines printed
    • call the test script via the gatekeeper. Only the complete output of the script will be displayed.

Additional Information

During my research, I have found out that there is a similar issue related to Nginx, were the internal buffer from Nginx is preventing this kind of logic to work. Maybe it's a similar reason here as well.

@Beanow
Copy link

Beanow commented Aug 13, 2020

Had a similar, related issue with "streaming" requests.

Use case

Using MJPEG-streamer to expose a webcam.

The way this "stream" is implemented (http handler here), is by sending a
Content-Type: multipart/x-mixed-replace;boundary=...
Multipart response body, and sending an infinite amount of
Content-Type: image/jpeg parts as frames, as long as the connection is open.
The upstream server flushes response body data whenever it can buffer a frame.

This particular software exposes such a stream on GET /?action=stream.

Expected result

Louketo is able to proxy this request as an unbuffered, infinite response body.

Actual result

While buffering is not an obvious issue here, the stream is closed after 10 seconds.

A workaround for this would be to set --server-write-timeout=0s, to disable the timeout.

Reproducing

Requirements

  • Docker + Docker Compose
  • A V4L2 device on /dev/video0.
  • A testing authentication OIDC client, which allows http://localhost:3000/* redirect urls.

Start up this compose file with docker-compose up.

# docker-compose.yml
version: '3.7'
services:
  oidc-gate:
    image: quay.io/louketo/louketo-proxy
    command: >-
      --server-write-timeout=0s
      --upstream-url=http://webcam:80
      --listen=:3000
      --enable-default-deny=true
      --discovery-url=https://example-keycloak/auth/realms/local-testing
      --client-id=local-app
      --client-secret=12345678-1234-1234-1234-123456789012
      --encryption-key=AgXa7xRcoClDEU0ZDSH4X0XhL5Qy2Z2j
    ports:
      - 3000:3000

  webcam:
    image: sixsq/mjpg-streamer
    devices:
      # Streams from a V4L2 camera. Like a laptop/usb webcam.
      - /dev/video0
    ports:
      # Runs on :80 internally,
      # we expose 8080 for testing without auth.
      - 8080:80
  1. GET http://localhost:8080/?action=stream should serve the upstream server's MJPEG stream correctly.
  2. GET http://localhost:3000/?action=stream should first require authentication, then stream indefinitely.
  3. Change the --server-write-timeout=0s option to --server-write-timeout=3s and docker-compose up the changes.
  4. GET http://localhost:3000/?action=stream will end the stream after 3 seconds.

Additional Information

Other reverse proxy setups, do not require changing such a timeout in the first place.
For example I've also proxied this through Caddy. There the reverse_proxy directive needs an option flush_interval -1.

The proxy buffers responses by default for wire efficiency:

  • flush_interval is a duration value that defines how often Caddy should flush the buffered response body to the client. Set to -1 to disable buffering.

This is closer to the problem @pahrens is observing, because Caddy will disable buffering of the response this way.
Setting this to -1 also avoids any timeout issues. Caddy will just happily send an infinite response body.

Which makes a lot of sense, because no buffers means no latency, no congestion (for the proxy), no memory hogging...
It becomes a problem of the upstream and client to set sane timeouts.

@Beanow
Copy link

Beanow commented Aug 13, 2020

I've poked a little bit at the provided PHP example.

Using a similar repro as I shared for my use-case:

# docker-compose.yml
version: '3.7'
services:
  oidc-gate:
    image: quay.io/louketo/louketo-proxy
    command: >-
      --server-write-timeout=0s
      --upstream-url=http://php:80
      --listen=:3000
      --enable-default-deny=true
      --discovery-url=https://example-keycloak/auth/realms/local-testing
      --client-id=local-app
      --client-secret=12345678-1234-1234-1234-123456789012
      --encryption-key=AgXa7xRcoClDEU0ZDSH4X0XhL5Qy2Z2j
    ports:
      - 3000:3000

  php:
    image: php:apache
    volumes:
      - ./stream-example.php:/var/www/html/index.php
    ports:
      - 8080:80

Used tech

The way PHP streams the response here is using Transfer-Encoding: chunked.
PHP will handle the encoding of this for you through the flush functions.

Additionally, my PHP response included Content-Encoding: gzip and is reported as 596 B, compressed.

Observations

The proxied response also reports Transfer-Encoding: chunked, but no gzip.
The reported size is 2.20 KB. So it would appear the proxy has decompressed for us.

Using a curl --raw request with the needed auth cookies shows it also doesn't have the original chunks.
Instead I got 3 chunks: 800 bytes, 4 bytes, 0 bytes.

800
starting script ... you should see output every 1 second<br>
index: 1; ob_level: 0; ob_length: <br>
index: 2; ob_level: 0; ob_length: <br>
index: 3; ob_level: 0; ob_length: <br>
index: 4; ob_level: 0; ob_length: <br>
index: 5; ob_level: 0; ob_length: <br>
index: 6; ob_level: 0; ob_length: <br>
index: 7; ob_level: 0; ob_length: <br>
index: 8; ob_level: 0; ob_length: <br>
index: 9; ob_level: 0; ob_length: <br>
index: 10; ob_level: 0; ob_length: <br>
index: 11; ob_level: 0; ob_length: <br>
index: 12; ob_level: 0; ob_length: <br>
index: 13; ob_level: 0; ob_length: <br>
index: 14; ob_level: 0; ob_length: <br>
index: 15; ob_level: 0; ob_length: <br>
index: 16; ob_level: 0; ob_length: <br>
index: 17; ob_level: 0; ob_length: <br>
index: 18; ob_level: 0; ob_length: <br>
index: 19; ob_level: 0; ob_length: <br>
index: 20; ob_level: 0; ob_length: <br>
index: 21; ob_level: 0; ob_length: <br>
index: 22; ob_level: 0; ob_length: <br>
index: 23; ob_level: 0; ob_length: <br>
index: 24; ob_level: 0; ob_length: <br>
index: 25; ob_level: 0; ob_length: <br>
index: 26; ob_level: 0; ob_length: <br>
index: 27; ob_level: 0; ob_length: <br>
index: 28; ob_level: 0; ob_length: <br>
index: 29; ob_level: 0; ob_length: <br>
index: 30; ob_level: 0; ob_length: <br>
index: 31; ob_level: 0; ob_length: <br>
index: 32; ob_level: 0; ob_length: <br>
index: 33; ob_level: 0; ob_length: <br>
index: 34; ob_level: 0; ob_length: <br>
index: 35; ob_level: 0; ob_length: <br>
index: 36; ob_level: 0; ob_length: <br>
index: 37; ob_level: 0; ob_length: <br>
index: 38; ob_level: 0; ob_length: <br>
index: 39; ob_level: 0; ob_length: <br>
index: 40; ob_level: 0; ob_length: <br>
index: 41; ob_level: 0; ob_length: <br>
index: 42; ob_level: 0; ob_length: <br>
index: 43; ob_level: 0; ob_length: <br>
index: 44; ob_level: 0; ob_length: <br>
index: 45; ob_level: 0; ob_length: <br>
index: 46; ob_level: 0; ob_length: <br>
index: 47; ob_level: 0; ob_length: <br>
index: 48; ob_level: 0; ob_length: <br>
index: 49; ob_level: 0; ob_length: <br>
index: 50; ob_level: 0; ob_length: <
4
br>

0

Now,

Transfer-Encoding is a hop-by-hop header, that is applied to a message between two nodes, not to a resource itself. Each segment of a multi-node connection can use different Transfer-Encoding values. If you want to compress data over the whole connection, use the end-to-end Content-Encoding header instead.

This suggests, removing the chunks and buffering, is perfectly within spec.
However decompressing Content-Encoding: gzip and removing the header is not allowed according to spec.
Relates to: #642

@Beanow
Copy link

Beanow commented Aug 15, 2020

Digging into this more, I found the issue.

Cause, default flushing behaviour

The proxy dependency goproxy does not control how flushing is done. So it defaults to what the go standard library implements for both the upstream and downstream clients.

https://github.com/elazarl/goproxy/blob/0581fc3aee2d07555835bed1a876aca196a4a511/proxy.go#L180
The io.Copy of the body here, will copy the data as soon as it's available (each chunk PHP flushes), but the Go default is to flush to downstream every certain amount of bytes. Regardless of how long it takes to fill up that buffer.

Flushing manually

To flush sooner, the http.ResponseWriter may also implement http.Flusher and we can explicitly call .Flush().
See https://stackoverflow.com/a/30603654

That would flush whatever we have in the buffer so far, which may result in different sizes of chunks than PHP originally gave us. (Though that's acceptable in HTTP spec).

Adopting flush_interval from Caddy

Caddy is also written in Go, so we can compare implementations. Rather than io.Copy they use this
https://github.com/caddyserver/caddy/blob/e385be922569c07a0471a6798d4aeaf972facb5b/modules/caddyhttp/reverseproxy/streaming.go#L126

Which may be an interesting addition to goproxy. But I also managed to modify and implement this as middleware when directly using goproxy. By wrapping the response writer and implementing io.ReaderFrom, io.Copy will use our implementation, which can be based on Caddy's flushing rules.

I'll have a go at porting that to louketo next.

Beanow added a commit to Beanow/louketo-proxy that referenced this issue Aug 15, 2020
TODO:
- SIGSEGV spotted when using 2s default and `make test`
- Unsure about Apache 2 & BSD-3 license compatibility as used
Beanow added a commit to Beanow/louketo-proxy that referenced this issue Aug 16, 2020
TODO:
- Unsure about Apache 2 & BSD-3 license compatibility as used
- Test for streaming, streaming + gzip
@Beanow
Copy link

Beanow commented Sep 7, 2020

I got some of the way there with my WIP, feel free to check that out and use it.
But given #683's sunsetting of the project I won't put in the work to make a PR out of it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants