You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possible to get hdfs-cleaner working/packaged with non-pnda HDFS?
We're running HDP and currently cleaning old files using a NFS mount (via HDFS NFSGateway). And we use find to delete old files. It's slow and buggy.
I really haven't found any good solution to delete old files in HDFS. I would like to give your cleaner a try.
The text was updated successfully, but these errors were encountered:
Hi @Raboo I believe this will work fine with generic HDFS.
Looking at the code, the part where it tries to clean up PNDA datasets should be skipped over if there is no dataset table available in HBase to read, and that's the only PNDA specific bit.
If you aren't using CDH or HDP as the hadoop distro you will have to do a bit of work as it wants to be able to connect to either Cloudera Manager (CDH) or Ambari (HDP) to discover the endpoints to use. If you you don't have either of those then you would have to fill out some other implementation of endpoint discovery in endpoint.py such as supplying the values directly in the config file.
There are few different categories of file in HDFS it cleans up:
spark_streaming_dirs_to_clean - checks that the files do not correspond to currently running yarn jobs before deleting them
general_dirs_to_clean - deletes all files from here
old_dirs_to_clean - deletes file from here if the last modified time is are older than a certain age
Let me know how you get on, and do submit a patch if you manage to extend it in a useful way.
We are using HDP.
spark_streaming_dirs_to_clean, general_dirs_to_clean, old_dirs_to_clean aren't that all just the same thing? folders where you look for older files that you can delete?
Or is this spark_streaming_dirs referring to spark history server?
Do you need to fill all fields in the properties file?
Hi,
Is it possible to get hdfs-cleaner working/packaged with non-pnda HDFS?
We're running HDP and currently cleaning old files using a NFS mount (via HDFS NFSGateway). And we use
find
to delete old files. It's slow and buggy.I really haven't found any good solution to delete old files in HDFS. I would like to give your cleaner a try.
The text was updated successfully, but these errors were encountered: