Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

I have problem with dbt_hook to write into logs permission denied #33

Open
ravog opened this issue Apr 19, 2021 · 18 comments
Open

I have problem with dbt_hook to write into logs permission denied #33

ravog opened this issue Apr 19, 2021 · 18 comments
Labels
question Further information is requested

Comments

@ravog
Copy link

ravog commented Apr 19, 2021

No description provided.

@andrewrjones
Copy link
Contributor

Hi @ravog,

This operator just logs as all other Airflow tasks, and doesn't try to write any other logs in any other location.

Are your other Airflow tasks writing logs ok? Is it just this task/operator?

Regards,
Andrew

@andrewrjones andrewrjones added the question Further information is requested label Apr 20, 2021
@dkrylovsb
Copy link

The author might be referring to the error message I've just run into when using airflow-dbt with AWS MWAA:

[2021-08-05 15:05:37,950] {{dbt_hook.py:109}} INFO - Output:
[2021-08-05 15:05:40,340] {{dbt_hook.py:113}} INFO - Running with dbt=0.20.0
[2021-08-05 15:05:40,594] {{dbt_hook.py:113}} INFO - Encountered an error:
[2021-08-05 15:05:40,617] {{dbt_hook.py:113}} INFO - [Errno 13] Permission denied: 'logs/dbt.log'
[2021-08-05 15:05:40,722] {{dbt_hook.py:117}} INFO - Command exited with return code 2
[2021-08-05 15:05:40,759] {{taskinstance.py:1150}} ERROR - dbt command failed```

@Falydoor
Copy link
Contributor

Falydoor commented Sep 2, 2021

I had to update the value of log-path in my dbt_project.yml (https://docs.getdbt.com/reference/project-configs/log-path) with something like /usr/local/airflow/tmp/logs in order to run on AWS MWAA.

@prakash260
Copy link

Hi ,

I was able to fix the permission denied error as @Falydoor has suggested but i am getting readonly file system on writing partial parsing:

[2021-09-07 00:32:19,011] {{dbt_hook.py:117}} INFO - /usr/local/airflow/.local/bin/dbt run --profiles-dir /usr/local/airflow/dags/dbt1/
[2021-09-07 00:32:19,045] {{dbt_hook.py:126}} INFO - Output:
[2021-09-07 00:32:20,899] {{dbt_hook.py:130}} INFO - Running with dbt=0.20.1
[2021-09-07 00:32:22,972] {{dbt_hook.py:130}} INFO - Encountered an error:
[2021-09-07 00:32:23,184] {{dbt_hook.py:130}} INFO - [Errno 30] Read-only file system: 'target/partial_parse.msgpack'
[2021-09-07 00:32:23,214] {{dbt_hook.py:134}} INFO - Command exited with return code 2

Please let me know if anyone has an answer for this.

@Falydoor
Copy link
Contributor

Falydoor commented Sep 7, 2021

Hey @prakash260,

Try updating the target-path property too (https://docs.getdbt.com/reference/project-configs/target-path) with /usr/local/airflow/tmp/target for example.

Maybe there is a better way rather than using a temp folder like disabling dbt logs/target generation.

@prakash260
Copy link

thanks @Falydoor i too disagree on temp usage too much but i will see whether it will work or not.

@prakash260
Copy link

ok i have tried replacing the location to /usr/local/airflow/dags/{dbt-directory} and everything is working now

@maker100
Copy link

@prakash260 could you please elaborate more on {dbt_directory} ?
I tried to use in my case:
target-path: "/usr/local/airflow/dags/dbt_target

but no success. Does MWAA has write access to /usr/local/airflow/dags/ ?
I read in AWS documentation (https://docs.aws.amazon.com/mwaa/latest/userguide/mwaa-faqs.html#custom-image) that only temp:
Your Apache Airflow Operators can store temporary data on the Workers. Apache Airflow Workers can access temporary files in the /tmp on the Fargate containers for your environment.
but this is not good approach.

@prakash260
Copy link

prakash260 commented Oct 27, 2021

hey @maker100,

That particular location gets picked from S3 as part of MWAA hence i was forced stored the details in there.
I did try those options and felt that using Dbt Cloud is much easier than customizing it.
But that target thing did work for me for dbt core.

To be more precise your DBT project files need to be present in S3 location for this thing to work.

@Gatsby-Lee
Copy link

so, what is the right approach?
what should be done to handle the permission issue?

@maker100
Copy link

Hi @Gatsby-Lee ,

because of several issues with the direct use of dbt installed on MWAA like:

  • Python library issues
  • DBT issues with path parsing when the dbt models were stored in custom path
  • long time of plugins upload with MWAA - in plugins.zip were stored models

I decided to use separate environment and use dbt on AWS Batch service using ECR image.
You can use also for it Kubernetes.

I recommend to use MWAA only as a scheduler and not to install dbt directly there.

@Falydoor
Copy link
Contributor

Hey @Gatsby-Lee,

I agree with @maker100, you should avoid running heavy process like DBT directly on MWAA. My Airflow DAG triggers an ECS task that runs on Fargate to run my DBT code so I don't have to worry about resource allocation.

@Gatsby-Lee
Copy link

@maker100 @Falydoor
Hi, from my heart, I really appreciate to your comment.
I didn't expect this fast reply to my question :)

I have a following question.
Q1. Do you guys mean that MWAA trigger event ( or execute through Operator ) to run other AWS service like AWS Batch or Fargate?
Q2. Does it mean the DBT is built as an image?

Thank you

@Falydoor
Copy link
Contributor

1: I used this operator to trigger my ECS task https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/ecs.html
2: Yes, I have a Dockerfile that uses an image with dbt (https://hub.docker.com/r/fishtownanalytics/dbt) and then my dbt code is copied to it.

@Gatsby-Lee
Copy link

@Falydoor
Thank you for your reply 👍

@joaquimsage
Copy link

Hello there @Falydoor ,

I have been running all the same steps along this conversation thread; and most of the errors mentioned in this thread have been happening to me.
I guess there is not been updates or fixes by AWS on MWAA running DBT so far. My guess is that the best solution (less work too, compared to other options like DBT on EC2, DBT on Lambdas, etc.) would be to run DBT on ECS and invoke that from within tasks within DAGs and have a well-decoupled architecture between Airflow and the dbt environment and transformation itself. Is that what you were doing?

@Falydoor
Copy link
Contributor

Hello @joaquimsage,

Yes correct! The ECS Airflow operator can be used to run your task definition on your ECS cluster (use Fargate so you don't have to manage EC2s). One "small" drawback is that the task usually takes 1 minute to start so it delays a bit your DBT run.

About MWAA, I don't think AWS will do any updates to fix the permission/read-only issues 😬.

@vijayscbitscrunch
Copy link

Hi all, if you want to run dbt directly on airflow:

Please make these changes to the dbt_project.yml as only tmp directory has read-write permission in MWAA.

packages-install-path: "/usr/local/airflow/tmp/dbt_packages"
log-path: "/usr/local/airflow/tmp/logs"
target-path: "/usr/local/airflow/tmp/target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by dbt clean

  • "/usr/local/airflow/tmp/target"
  • "/usr/local/airflow/tmp/dbt_packages"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants