-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from.dfs produces "file does not exist" error #161
Comments
can you enter
and paster the output back here? |
|
Wihout closing that R session where you did the last step, try at the shell hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpmQu2O7/file1b584440dee3 On Tue, Mar 17, 2015 at 10:39 AM, kardes [email protected] wrote:
|
I opened a new terminal window (without closing the current one with the R session) and entered that line: $ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpmQu2O7/file1b584440dee3 |
make sure HADOOP_STREAMING is set in that shell instance. It looks like On Tue, Mar 17, 2015 at 10:52 AM, kardes [email protected] wrote:
|
Could you please pervade specific instructions on how to do it? small.ints <- to.dfs(1:1000) I check out()
Then I get out of R using Ctrl-Z and entering bg (putting R into the background) $hadoop jar $HADOOP_STREAMING dumptb /tmp/Rtmp5Nt5L7/file25ed2392eeba and get Not a valid JAR: /home/cloudera/dumptb thanks |
That's part of installing rmr2. No HADOOP_STREAMING, no rmr2. On Tue, Mar 17, 2015 at 12:13 PM, kardes [email protected] wrote:
|
Hi Antonio, |
You are seriously telling me you do not understand a list of two files? Can On Fri, Mar 20, 2015 at 11:09 AM, kardes [email protected] wrote:
|
sorry, of course not. I still get the error on the top of this page (the original error). I tried the following: Opened a terminal and entered: Then in R, I entered: in the last line above, I get the same error. I tried the following last, but I am not sure how to proceed. please help! thanks.
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /tmp/RtmpuZJz6S/file1c1778b1f9b/_logs
|
The last error you got is expected because you did a dumptb on a directory On Fri, Mar 20, 2015 at 12:49 PM, kardes [email protected] wrote:
|
I did update the .bash_profile file. upon your recommendation, I still get the error. could you please let me know how to proceed? thank you very much. here is a snapshot of my shell. [cloudera@quickstart ~]$ echo "export HADOOP_STREAMING=/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.5.0-mr1-cdh5.3.2.jar" >> ~/.bash_profile R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" R is free software and comes with ABSOLUTELY NO WARRANTY. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'demo()' for some demos, 'help()' for on-line help, or
|
I am at a loss. Those are not files that rmr2 manipulates, at least not On Fri, Mar 20, 2015 at 3:07 PM, kardes [email protected] wrote:
|
I am not using YARN. I am using MR1. that's why I am doing: echo
On Fri, Mar 20, 2015 at 4:54 PM, Antonio Piccolboni <
|
You understand that since I can't reproduce this, unless you can give me
15/03/20 15:01:02 INFO streaming.StreamJob: Output: /tmp/RtmpT6ad7h/
hdfs dfs -ls path
hadoop jar $HADOOP_STREAMING dumptb / <> brackets mean "replace with actual value" It should fail exactly the way the from.dfs function fails. If that's the On Fri, Mar 20, 2015 at 5:11 PM, kardes [email protected] wrote:
|
actually I have done this before, and what I get when I do On Mon, Mar 23, 2015 at 10:10 AM, Antonio Piccolboni <
|
I don't know. I am very new to all this and maybe I made a mistake setting On Mon, Mar 23, 2015 at 2:09 PM, Erim Kardes [email protected] wrote:
|
Sorry, I forgot to add a redirection to that command, so you got the hadoop jar $HADOOP_STREAMING dumptb / > /tmp/dumptb.out (the last greater sign to be entered as is, no substitutions) The other thing is that if this succeeds, it's more of a rmr2 problem (my debug(from.dfs) from.dfs(out) step until function dumptb definition, and step one more debug(dumptb) c You are now in dumptb, a Very Simple Function Please print the contents of src and also paste(hadoop.streaming(), And paste the results here. The idea is that the dumptb function does On Mon, Mar 23, 2015 at 2:11 PM, kardes [email protected] wrote:
|
correction, that cmd should read paste(hadoop.streaming(), On Mon, Mar 23, 2015 at 2:30 PM, Antonio Piccolboni <[email protected]
|
On Wed, Mar 25, 2015 at 3:29 PM, kardes [email protected] wrote:
|
after running the mapreduce job, I did the following: [cloudera@quickstart ~]$ hadoop fs -ls /tmp/RtmpUzjoWy/file1c7dbd916e7 Found 4 items and then [cloudera@quickstart ~]$ hadoop jar $HADOOP_STREAMING dumptb /tmp/RtmpUzjoWy/file1c7dbd916e7/part-00001 > /tmp/dumptb.out so I did not get an error in this case. so I continued: Browse[2]> debug(dumptb) Browse[2]> paste(hadoop.streaming(),"dumptb",rmr.normalize.path(src[[1]]),">>",rmr.normalize.path(dest)) please let me know how to proceed. thanks for your time. |
I have it, part.list is failing. Probably a problem with hdfs.ls rmr2:::hdfs.ls(out()) Please share what that returns, its class. On Wed, Mar 25, 2015 at 3:56 PM, kardes [email protected] wrote:
|
Browse[2]> rmr2:::hdfs.ls(out()) |
This is close to impossible. Please enter packageDescription("rmr2") On Wed, Mar 25, 2015 at 4:25 PM, kardes [email protected] wrote:
|
packageDescription("rmr2") -- File: /usr/lib64/R/library/rmr2/Meta/package.rds |
Please upgrade to the latest version. Thanks Antonio On Wed, Mar 25, 2015 at 4:36 PM, kardes [email protected] wrote:
|
made it! thanks! |
Hi,
I set up R and Hadoop using cloudera quick start VM CDH 5.3.
R version 3.1.2. VirtualBox Manager 4.3.20 running on MacOSX 10.7.5
I followed the blog
http://www.r-bloggers.com/integration-of-r-rstudio-and-hadoop-in-a-virtualbox-cloudera-demo-vm-on-mac-os-x/
to set up R and Hadoop and turned of MR2/YARN. Instead I Am using MR1.
Everything seems to work fine but the from.dfs function.
I am using the simple example in R:
small.ints <- to.dfs(1:1000)
out <- mapreduce(input = small.ints, map = function(k, v) keyval(v, v^2))
df <- as.data.frame(from.dfs(out))
from.dfs produces the following error. If you could be of any hep, I'd greatly appreciate it. Thank you very much. -EK
When I use it I get the error:
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/128432
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/422
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/122
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
The text was updated successfully, but these errors were encountered: