Chrome not reachable after the first execution in lambda #131

karthiks3000 · 2022-08-19T18:58:35Z

karthiks3000
Aug 19, 2022

First off thank you for the great article on how to run selenium in aws lambda. I followed the instructions outlined here and was able to get the lambda to run once.
After the first execution, each subsequent execution of the lambda fails with the error -

errorMessage": "Message: chrome not reachable

If I try again after 20mins or so I'm able to get another successful run which are then followed by unsuccessful runs.

Here is my code -

def handler(event=None, context=None):
    chrome_options = webdriver.ChromeOptions()
    chrome_options.binary_location = "/opt/chrome/chrome"
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-dev-tools")
    chrome_options.add_argument("--no-zygote")
    chrome_options.add_argument("--single-process")
    chrome_options.add_argument("window-size=2560x1440")
    chrome_options.add_argument("--user-data-dir=/tmp/chrome-user-data")
    chrome_options.add_argument("--remote-debugging-port=9222")
    #chrome_options.add_argument("--data-path=/tmp/chrome-user-data")
    #chrome_options.add_argument("--disk-cache-dir=/tmp/chrome-user-data")
    chrome = webdriver.Chrome("/opt/chromedriver", options=chrome_options)

   
    chrome.get("https://www.google.ca/maps/place/Sushi+Maki/@49.2807611,-123.1270736,17z/data=!3m1!5s0x5486717f0a6c35a5:0x2cba7a9c7cdc2aeb!4m7!3m6!1s0x548673d541b27b4d:0x8cdeffd73083c283!8m2!3d49.2807611!4d-123.1248849!9m1!1b1")
    total_reviews = 0
    try:
            time.sleep(1)
            review_count_ele = chrome.find_element(By.XPATH, '//div[@class=\'jANrlb\']')
            total_reviews = int(review_count_ele.text.split('\n')[1].split(' reviews')[0].replace(',', ''))
            print(f'Total reviews to be found: {total_reviews}')

    except Exception as e:
        print(e)
    
    MAX_TIME_OUT = 5
    timeout = 0
    while timeout < MAX_TIME_OUT:
        try:
            menu_bt = chrome.find_element(By.XPATH, '//button[@data-value=\'Sort\']')
            menu_bt.click()
            print('sort menu clicked')
            break
        except Exception as e:
            print('cant find menu button')
            timeout +=1
            time.sleep(1)
    
    timeout = 0
    time.sleep(1)
    SORT_MENU_ITEM_XPATH = '//li[@role=\'menuitemradio\']'
    while timeout < MAX_TIME_OUT:
        try:
            xpath = f'{SORT_MENU_ITEM_XPATH}[2]'
            recent_rating_bt = chrome.find_element(By.XPATH, xpath)
            recent_rating_bt.click()
            print(f'menu item clicked')
            break
        except Exception as e:
            print('cant find menu item')
            timeout +=1
            time.sleep(1)
    
    time.sleep(1)
    reviews_list = chrome.find_elements(By.XPATH, "//div[@data-review-id][@aria-label]")
    reviews_found = len(reviews_list)
    chrome.close()
    chrome.quit()
    return {
        "statusCode": 200,
        "body": json.dumps(
            {
                "message": reviews_found,
            }
        ),
    }

I've tried to increase the memory size to 3008 and that had no impact either.
This works perfectly on my local (sam local invoke).

What's also weird is that it fails at different points on different runs. I suspect chrome is crashing but I have no clue why or how to get around this.

Any suggestions would be greatly appreciated!

Full error message -

{
"errorMessage": "Message: chrome not reachable\n (Session info: headless chrome=103.0.5060.0)\nStacktrace:\n#0 0x563aa11ab759 \n#1 0x563aa1144cf3 \n#2 0x563aa0f23ca7 \n#3 0x563aa0f14b94 \n#4 0x563aa0f156ef \n#5 0x563aa0f17612 \n#6 0x563aa0f0fa78 \n#7 0x563aa0f252e3 \n#8 0x563aa0f8a1ce \n#9 0x563aa0f77ef3 \n#10 0x563aa0f4e27b \n#11 0x563aa0f4f455 \n#12 0x563aa1173870 \n#13 0x563aa11858b0 \n#14 0x563aa11855bc \n#15 0x563aa1185e32 \n#16 0x563aa1174b9b \n#17 0x563aa11860b6 \n#18 0x563aa11664dd \n#19 0x563aa119d888 \n#20 0x563aa119da12 \n#21 0x563aa11b7c2e \n#22 0x7f6436eec44b \n#23 0x7f6435a0356f \n",
"errorType": "WebDriverException",
"requestId": "a7eb93bc-0740-4d96-b665-d720d3cca6bf",
"stackTrace": [
" File "/var/task/app.py", line 93, in handler\n reviews_list = chrome.find_elements(By.XPATH, "//div[@data-review-id][@aria-label]")\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 888, in find_elements\n return self.execute(Command.FIND_ELEMENTS, {\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 428, in execute\n self.error_handler.check_response(response)\n",
" File "/var/lang/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}

Answered by karthiks3000

Aug 20, 2022

Finally figured out what was going on. Apparently all the lambda executions share a single volume mounted in /tmp with a default size of 512mb that was getting filled up with the data produced by the crawler. Increasing the size to 3gb fixed it for me. I also implemented a workaround to clean up the data after the execution run. In case this helps anyone, here is the code for the driver initialization -

def __get_driver(self):
        self._tmp_folder = '/tmp/{}'.format(uuid.uuid4())
        if not os.path.exists(self._tmp_folder):
            os.makedirs(self._tmp_folder)

        if not os.path.exists(self._tmp_folder + '/chrome-user-data'):
            os.makedirs(self._tmp_folder + '/…

View full answer

karthiks3000 · 2022-08-20T19:04:04Z

karthiks3000
Aug 20, 2022
Author

Finally figured out what was going on. Apparently all the lambda executions share a single volume mounted in /tmp with a default size of 512mb that was getting filled up with the data produced by the crawler. Increasing the size to 3gb fixed it for me. I also implemented a workaround to clean up the data after the execution run. In case this helps anyone, here is the code for the driver initialization -

def __get_driver(self):
        self._tmp_folder = '/tmp/{}'.format(uuid.uuid4())
        if not os.path.exists(self._tmp_folder):
            os.makedirs(self._tmp_folder)

        if not os.path.exists(self._tmp_folder + '/chrome-user-data'):
            os.makedirs(self._tmp_folder + '/chrome-user-data')

        if not os.path.exists(self._tmp_folder + '/data-path'):
            os.makedirs(self._tmp_folder + '/data-path')

        if not os.path.exists(self._tmp_folder + '/cache-dir'):
            os.makedirs(self._tmp_folder + '/cache-dir')

        chrome_options = webdriver.ChromeOptions()
        chrome_options.binary_location = "/opt/chrome/chrome"
        chrome_options.add_argument("--headless")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        chrome_options.add_argument("--disable-gpu")
        chrome_options.add_argument("--disable-dev-tools")
        chrome_options.add_argument("--no-zygote")
        chrome_options.add_argument("--single-process")
        chrome_options.add_argument("window-size=2560x1440")
        chrome_options.add_argument(f"--user-data-dir={self._tmp_folder}/chrome-user-data")
        chrome_options.add_argument(f"--data-path={self._tmp_folder}/data-path")
        chrome_options.add_argument(f"--disk-cache-dir={self._tmp_folder}/cache-dir")
        chrome_options.add_argument("--remote-debugging-port=9222")
        input_driver = webdriver.Chrome("/opt/chromedriver", options=chrome_options)
        return input_driver

And the clean up code on the exit function -

def __exit__(self, exc_type, exc_value, tb):
        self.print_log('Driver Exit called')
        if exc_type is not None:
            print(exc_type, exc_value, tb)

        self.driver.close()
        self.driver.quit()
        shutil.rmtree(self._tmp_folder)

        return True

0 replies

rehanhaider · 2022-08-21T18:51:39Z

rehanhaider
Aug 21, 2022
Maintainer

Finally figured out what was going on. Apparently all the lambda executions share a single volume mounted in /tmp with a default size of 512mb that was getting filled up with the data produced by the crawler. Increasing the size to 3gb fixed it for me. I also implemented a workaround to clean up the data after the execution run. In case this helps anyone, here is the code for the driver initialization -

Glad you were able to find a solution. You are correct, SOMETIMES Lambda reuses the container it created for an invocation so it's a good practice to clean /tmp folder. This is called warm-start where Lambda reuses an existing container to reduce latency during invocations and reduce several seconds delay that is experienced when invoking a Lambda function for the first time.

**You can confirm this with the below experiment. **

Testing multiple consecutive invocations

I ran a simple Python program to write the current time stamp to a file in /tmp

import json
import os
import time

def lambda_handler(event, context):
    time_stamp = time.strftime("%H%M%S")
    
    with open(f"/tmp/{time_stamp}.txt", "w") as f:
        f.write(f"time_stamp = {time_stamp}")
        f.close()
        
    with open(f"/tmp/timestamps.txt", "a") as f:
        f.write(time_stamp)
        f.write("\n")
        f.close()
    
    print("Contents of tmp folder")
    os.system('ls -la /tmp')
    
    print("Contents of file")
    os.system("cat /tmp/timestamps.txt")
    
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

This resulted in the following

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chrome not reachable after the first execution in lambda #131

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Chrome not reachable after the first execution in lambda #131

karthiks3000 Aug 19, 2022

Replies: 2 comments

karthiks3000 Aug 20, 2022 Author

rehanhaider Aug 21, 2022 Maintainer

Testing multiple consecutive invocations

karthiks3000
Aug 19, 2022

karthiks3000
Aug 20, 2022
Author

rehanhaider
Aug 21, 2022
Maintainer