You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I will get a lot of Exception from missed data Transmissions.
java.net.SocketException: Datenübergabe unterbrochen (broken pipe) (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:732)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$updateAccumulators$1(DAGScheduler.scala:1524)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$updateAccumulators$1$adapted(DAGScheduler.scala:1515)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1515)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1627)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2588)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
I am running the code on an Hadoop Cluster with Spark v3.2.0.3.2.7170.0-49 on yarn.
uses Python version is 3.9.19
The code seems to run but it is a bit anying for the logs.
Hey,
If I try to run the code
I will get a lot of Exception from missed data Transmissions.
I am running the code on an Hadoop Cluster with Spark v3.2.0.3.2.7170.0-49 on yarn.
uses Python version is 3.9.19
The code seems to run but it is a bit anying for the logs.
The Output seems like on your example.
+---------+----------+-----------------------+
| word1 | word2 |jaro_winkler_similarity|
+---------+----------+-----------------------+
| jellyfish. | smellyfish| 0.8962963 |
| li | lee | 0.6111111 |
| luisa | bruna | 0.6 |
| null | null | null |
+---------+----------+-----------------------+
The text was updated successfully, but these errors were encountered: