Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare block when fetching for stream url with correct user agent #1041

Open
yongfg opened this issue Feb 12, 2025 · 7 comments
Open

Cloudflare block when fetching for stream url with correct user agent #1041

yongfg opened this issue Feb 12, 2025 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@yongfg
Copy link

yongfg commented Feb 12, 2025

Background

There was some user agent issue tracked in other thread but none of the existing user agent could give rtsp stream url, so I reverse engineered and grabbed the correct user agent that works on my phone.
The user agent looks like this (iPhone15,2 18_1_1) iOS Arlo 5.4.3
I verified that this user agent works and is giving me the correct rtsp stream url I want. So I start to use this user agent when fetching for stream urls whenever there's a motion triggered event. Which is a pretty normal thing.

However

It seems like, with the same user agent, after successfully fetching the stream url for a couple times, I start to get 403 Unknown error occurred. I verified that the credentials are still working fine. When I restarted the Pyaarlo object (meaning reload the session file and grabbed a new scraper), most time it comes back to work for a couple tries and then it runs into the same problem

I have the strong doubt that it's due to cloudflare. So I tried to refresh the scraper upon failure and it gets the situation better. However, it doesn't seem to work for every account. For account that has more devices, it seems more likely to fail.

Any idea, suggestion, experience @twrecked to bypass the cloudflare issue? I'm a little bit running out of options for now. Really Appreciated!

Please let me know if I should provide more information.

@twrecked
Copy link
Owner

Thank you for looking into this. I've been trying to get the rtsp stream back after the old user agent I was using was deprecated.

I'll keep playing around with this and report back.

The way I was looking at getting this to work was by adding an egressToken header into the Stream component but that was looking quite complicated to achieve.

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

I figured out the new agent by inspecting the traffic from my app and got the working user agent.

I'm interested in your idea. If you can share more information or obstacles I can also help try out.

Also, I tried to recreate the scraper with the cookies and user agent like this (your code):
_cookies, self._user_agent = cloudscraper.get_tokens(ORIGIN_HOST)
But it doesn't seem to be better.

Furthermore, I find using proxies is necessary for me. So every rtsp stream call Im making is with random rotating proxies

@twrecked
Copy link
Owner

Using the mpeg-dash stream is easy enough, here is come code I was using to test pyaarlo.

    stream_url = camera.start_stream("mac")
    print("stream-url={}".format(stream_url))
    url = urlparse(stream_url)
    egress_token = parse_qs(url.query)["egressToken"][0]

    print('starting ffmpeg')
    os.system(f"ffmpeg -v debug "
              f"-headers 'Egress-Token: {egress_token}\r\n"
              "Origin: https://my.arlo.com\r\n"
              "Referer: https://my.arlo.com/\r\n"
              "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_3) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3 Safari/605.1.15\r\n' "
              f"-i '{stream_url}' "
              "-c copy out.mp4")

I just need a mechanism to pass those extra headers into the stream component of Home Assistant to make it work. There is a mechanism to pass in options I think I just need to expand that.

I'm going to push a change so other people can test the user agent you find. We might be able to find a pattern of when it stops working.

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

Talking about the mpeg-dash stream, Im always getting:

https://arlostreaming21093-z2-prod.wowza.arlo.com:80/stream/AF524577E0D0C_1739335424492.mpd?egressToken=c50899ae_4e7c_453b_afa6_766a567d52eb&userAgent=web&cameraId=AF524577E0D0C_1739335424492&txnId=FE!eb34d40e-42ce-4d49-9b99-dff1daa6edb7&watchalong=true: Invalid data found when processing input

that's why I have to fall back to the rtsp. Any quick thoughts? I'd be really appreciated if can get this https stream to work cz then we'll have two options

@twrecked
Copy link
Owner

This is what I showed you. We need to pass the headers I showed you in the previous message to the stream component. I'm trying to work out how to do it.

ffmpeg needs to pass an egressToken as part of the headers when it opens the stream.

@twrecked twrecked self-assigned this Feb 12, 2025
@twrecked twrecked added the enhancement New feature or request label Feb 12, 2025
@twrecked
Copy link
Owner

twrecked commented Feb 12, 2025

These diffs allow me get mpeg-dash streaming.

This diff applies to the core homeassistant.

diff --git a/homeassistant/components/stream/__init__.py b/homeassistant/components/stream/__init__.py
index 8fa4c69ac5a..51758f0ede8 100644
--- a/homeassistant/components/stream/__init__.py
+++ b/homeassistant/components/stream/__init__.py
@@ -44,6 +44,7 @@ from .const import (
     ATTR_SETTINGS,
     ATTR_STREAMS,
     CONF_EXTRA_PART_WAIT_TIME,
+    CONF_HTTP_HEADERS,
     CONF_LL_HLS,
     CONF_PART_DURATION,
     CONF_RTSP_TRANSPORT,
@@ -166,6 +167,8 @@ def _convert_stream_options(
         pyav_options["rtsp_transport"] = rtsp_transport
     if stream_options.get(CONF_USE_WALLCLOCK_AS_TIMESTAMPS):
         pyav_options["use_wallclock_as_timestamps"] = "1"
+    if headers := stream_options.get(CONF_HTTP_HEADERS):
+        pyav_options[CONF_HTTP_HEADERS] = headers
 
     # For RTSP streams, prefer TCP
     if isinstance(stream_source, str) and stream_source[:7] == "rtsp://":
@@ -624,5 +627,6 @@ STREAM_OPTIONS_SCHEMA: Final = vol.Schema(
         vol.Optional(CONF_RTSP_TRANSPORT): vol.In(RTSP_TRANSPORTS),
         vol.Optional(CONF_USE_WALLCLOCK_AS_TIMESTAMPS): bool,
         vol.Optional(CONF_EXTRA_PART_WAIT_TIME): cv.positive_float,
+        vol.Optional(CONF_HTTP_HEADERS): cv.string,
     }
 )
diff --git a/homeassistant/components/stream/const.py b/homeassistant/components/stream/const.py
index c81d2f6cb18..d6b96deef5c 100644
--- a/homeassistant/components/stream/const.py
+++ b/homeassistant/components/stream/const.py
@@ -60,6 +60,7 @@ RTSP_TRANSPORTS = {
 }
 CONF_USE_WALLCLOCK_AS_TIMESTAMPS = "use_wallclock_as_timestamps"
 CONF_EXTRA_PART_WAIT_TIME = "extra_part_wait_time"
+CONF_HTTP_HEADERS = "headers"
 
 
 class StreamClientError(IntEnum):

This is for the aarlo piece:

diff --git a/custom_components/aarlo/camera.py b/custom_components/aarlo/camera.py
index 9f4aa9e..0ef985d 100644
--- a/custom_components/aarlo/camera.py
+++ b/custom_components/aarlo/camera.py
@@ -13,6 +13,7 @@ import logging
 import voluptuous as vol
 from collections.abc import Callable
 from haffmpeg.camera import CameraMjpeg
+from urllib.parse import urlparse, parse_qs
 
 import homeassistant.helpers.config_validation as cv
 from homeassistant.components import websocket_api
@@ -517,6 +518,23 @@ class ArloCam(Camera):
 
         return attrs
 
+    def _stream_source(self, user_agent):
+        """Return the source of the stream.
+
+        This set stream_options if the stream is https so we can pass egress
+        token on.
+        """
+        self.stream_options = {}
+        stream_url = self._camera.get_stream(user_agent)
+        if stream_url is not None:
+            if stream_url.startswith("https"):
+                url = urlparse(stream_url)
+                egress_token = parse_qs(url.query)["egressToken"][0]
+                self.stream_options = {
+                    "headers": f"Egress-Token: {egress_token}\r\n"
+                }
+        return stream_url
+
     async def stream_source(self):
         """Return the source of the stream.
 
@@ -524,11 +542,11 @@ class ArloCam(Camera):
         to the original Arlo one. This means we get a `rtsps` stream back which the stream
         component can handle.
         """
-        return await self.hass.async_add_executor_job(self._camera.get_stream, "arlo")
+        return await self.hass.async_add_executor_job(self._stream_source, "linux")
 
     async def async_stream_source(self, user_agent=None):
         return await self.hass.async_add_executor_job(
-            self._camera.get_stream, user_agent
+            self._stream_source, user_agent
         )
 
     def camera_image(

edit: removed the manifest changes

@yongfg
Copy link
Author

yongfg commented Feb 12, 2025

Great the mpeg-dash streaming works. Thank you for the help.

Also I wish to share the information when investigating the cloudflare issue. I figured out that the cloudflare issue doesn't seem to relate to the user agent. With either linux, mac or arlo, after requesting for the stream url for 6-9 times for the same device, I start to have 403. I tried to make the request pattern a bit more random (like random wait time, or random retry, etc) but doesn't seem to be helpful without refreshing the cloudscraper. After all, the cloudflare is protecting the endpoint so it's before we even got the stream url.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants