-
Notifications
You must be signed in to change notification settings - Fork 108
Shifu 0.2.5 Upgrade Guagua to Latest Version
Two key features from Guagua 0.6.x.
In Guagua 0.6.x, there are new Coordinators by default(NettyMasterCoordinator & NettyWorkerCoordinator). Worker results are not stored into zookeeper server. So zookeeper here is only used to store master results and coordination. We don't need a zookeeper with a big memory and disk. With that, embedded zookeeper server(in client or in master node) is enought for each model training Guagua job. User doesn't need to specify a zookeeper ensemble.
Before Guagua 0.6.0, Guagua leverages memory to store training data and this is the key part to accelerate training progress. While in some big data case and in some clusters, no enough memory, user can store part into memory and others into local disk.
One important feature from Guagua 0.7.0
Partial-complete feature means in each iteration master only wait for partial workers complete and to ignore straggler worker result. In Shifu 0.2.5, set ‘guagua.min.workers.ratio’ to 0.99 (default 1.0 means master wait for all workers), and ‘guagua.min.workers.timeout’ to 20000ms: which means in each iteration, master only waits for 99% workers in 20s and straggler worker being ignored in such iteration will catch up latest iteration after finish current iteration. You can also override such two parameters in your shifuconfig file.