Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download taxi data from chapter 3 #1

Open
ajcastany opened this issue Jul 21, 2023 · 3 comments
Open

Unable to download taxi data from chapter 3 #1

ajcastany opened this issue Jul 21, 2023 · 3 comments

Comments

@ajcastany
Copy link

ajcastany commented Jul 21, 2023

Running the provided Chapter 3 (page 59) taxi_data_prep.sh in WSL gives 403, access denied error, as the URL https://s3.amazonaws.com/nyc-tlc/trip+data/ also gives 403 access denied when accessing from browser.

How can i get access to the data?

--2023-07-21 15:42:25-- https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-04.csv Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.217.202.160, 54.231.233.40, 52.217.133.64, ... Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.202.160|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2023-07-21 15:42:26 ERROR 403: Forbidden.

@MertHoc
Copy link
Collaborator

MertHoc commented Jul 21, 2023

Hello. Thanks for brining this to our attention. It looks like the NYC Taxi public dataset changed their directory structure. We have updated the scripts to reflect the new structure. Please pull and try again now and let us know if you are still having trouble.

@ajcastany
Copy link
Author

I tried running the script and i get:

Downloading yellow_tripdata_2018-01.csv from nyc-tlc fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden Performing gzip on yellow_tripdata_2018-01.csv gzip: yellow_tripdata_2018-01.csv: No such file or directory

i cannot list the bucket with aws s3 ls nyc-tlc i get:

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

@ajcastany
Copy link
Author

I managed to get access to the S3 Bucket, in the Iam policy for Chapter_3 i had to add the bucket arn to s3:getObject, etc:

arn:aws:s3:::nyc-tlc and arn:aws:s3:::nyc-tlc/*

in this section: `https://github.com/PacktPublishing/Serverless-Analytics-with-Amazon-Athena/blob/main/chapter_3/iam_policy_chapter_3.json#L45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants