You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 27, 2020. It is now read-only.
When I am trying to loading data from mongo to spark, I find one wired performance issue in spark UI. It shows that it split into 2 stages the first is flatMap at MongodbSchema.scala:41, the second one is aggregate at MongodbSchema.scala:47. My problem is that the first stage always get one task and one executor, which is crazy slow with some big table. Sometimes, it use 1hr in flatMap stage but only several seconds in next one. source code is blow:
It looks like just grabbing the schema from collection. I don't know why this stage is limited to one executor. Is it normal? or whether I can do something to increase the number of executor and gain higher performance. I am working with spark 1.6.2 and stratio 0.11.0
THX.
The text was updated successfully, but these errors were encountered:
Facing a similar problem,can someone update on this.In order to reduce the fetch time,I'm trying to load mongo collection using a split key on isodate field without any success.
I'm using the following config to load data:
val mongoConfig = MongodbConfigBuilder(
Map(
Credentials -> List(slaveCredentials),
Host -> mongoHost,
Database -> mongoDatabase,
Collection -> mongoCollection,
SamplingRatio -> 1.0,
WriteConcern -> "normal",
SplitSize -> "10",
SplitKey -> "created_at",
SplitKeyMin -> "2016-11-20T10:01:32.239Z",
SplitKeyMax -> "2016-11-23T10:01:32.239Z",
SplitKeyType -> "isoDate"
)
).build()
val mongoDF = spark.sqlContext.fromMongoDB(mongoConfig)
But what I find was,this query is fetching all the data into spark which is very slow because of one executor in flat map stage.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When I am trying to loading data from mongo to spark, I find one wired performance issue in spark UI. It shows that it split into 2 stages the first is flatMap at MongodbSchema.scala:41, the second one is aggregate at MongodbSchema.scala:47. My problem is that the first stage always get one task and one executor, which is crazy slow with some big table. Sometimes, it use 1hr in flatMap stage but only several seconds in next one. source code is blow:
It looks like just grabbing the schema from collection. I don't know why this stage is limited to one executor. Is it normal? or whether I can do something to increase the number of executor and gain higher performance. I am working with spark 1.6.2 and stratio 0.11.0
THX.
The text was updated successfully, but these errors were encountered: