Skip to content

Patrick-devX/TensorFlow2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

############################
# Book hands On Machine Learning with Scikit Leaarn, Keras and TensorFlow2

# Chapter 10. Introduction to Artificial Neural Networks with Keras
McCulloch and Pitts proposed a very simple model of the biological neuron, which later become known as an artificial neuron:
it has one or more binary (on/off) inputs and one binary output. The artificial neuron activates its output when more than a certain number of its inputs are active.

The Perceptron is one of the simplest ANN architectures. It is based on a slightly different artificial neuron called a THRESHOLD LOGIC UNIT (TLU), or sometimes
a LINEAR THRESHOLD UNIT(LTU). A Perceptron is simply composed of a single layer of TLU, which each TLU connected to all the inputs. When all the neurons in a layer are
connected to every neuron in the previous layer, the layer is called fully connected layer , or a Dense layer.
The Perceptrons are trained using a variant of this rule that takes into account the error made by the network when makes a prediction; the perceptron learning rule
reinforces connections that help reduce the error, More specially, the Perceptron is fed one training instance time, and for each instance it makes its predictions

Scikit Learn prvides a Perceptron class that implements a single TLU (Threshold Logic Unit) network. In fact Scikit Learn's Perceptron class is equivalent to using
an SGDClassifier with the following hyperparameters: loss="perceptron", learning_rate="constant" with no regularizations.

Contrary to Logistic regression Classifiers, Perceptrons do not output a class probability; rather, they make predictions based on a hard threshold. This is the one reason
to prefer Logistic Regression over Perceptrons

In their 1969 monograph Perceptrons, Marvin Minsky and Seymour Papert highlighted a number of serous weaknesses of Perceptrons---in particular, the fact that
they are incapable of solving some trivial problems (e. g. the Exclusive Or(XOR)) classification problem. It turns that some of the limitations of the Perceptrons
 can be eliminated by stacking multiple Perceptrons. The resulting ANN is called a Multilayer Perceptron (MLP). An MLP can solve the XOR Problem.

 ## The Multilayer Perceptron and backpropagation
 An MLP is composed of one (passthrough) input Layer, one or more Layers of TLU (Theshold Logic Unit), called hidden Layers and one final layer of TLUs called the output layer.
 The signal flows only in one direction (from the inputs to the outputs), so this architechture is an example of feedfirward neural network (FNN)
 When an ANN contains a deep stack of hidden layers, it is called a deep neural network (DNN).

 MLPs are trained using the backpropagation training algorithm. In short ist the Gradient Descent.


 Implementing MLPs with keras
 Since 2016, other implementations have been released. You can now run Keras on Apache MXNET, Appele Core ML, JavaScript or TypeScript(to run Keras in a Web browser)

 ########################################################################

 Building an Image Classifier Using the Sequential API  imageClassifier1.py

 * Flatten Layer: Convert each imput Image into 1D array
 * High number of weights gives the model quite a lot of flexibility to fit the training data, but it also means that the model runs the risk of overfiting,
  especially when you do not have a lot of training data
 * sparse_categorical_crossentropy: loss because we have sparse labels (i.e., for each instance, there is just a target class index from 0 to 9 in this case)
    If instead we had one target probability per class for each instance ( such as one-hot vectors, e.g, [0.,0.,1.,0.,0.,0.,0.,0.,0.,0.] to represent 3),
    then we would need to use the categorical_crossentropy loss instead.
 * If we are doing binary classification ( with one or more binary labels) , then we would use the sigmoid ( i.e. logistic) activation function in the output layer
 instead of the softmax activation function, and we would use the binary_crossentropy

 * Instead of passing a validation set using the validation_data argument, you could
    set validation_split to the ratio of the training set that you want Keras to use for
    validation. For example, validation_split=0.1 tells Keras to use the last 10% of
    the data (before shuffling) for validation.

 * Skewed data:
If you are talking about the regular case, where your network produces only one output, then your assumption is correct. In order to force your algorithm to treat every instance of class 1 as 50 instances of class 0 you have to:

    Define a dictionary with your labels and their associated weights

    class_weight = {0: 1.,
                    1: 50.,
                    2: 2.}
    Feed the dictionary as a parameter:

    model.fit(X_train, Y_train, nb_epoch=5, batch_size=32, class_weight=class_weight)

EDIT: "treat every instance of class 1 as 50 instances of class 0" means that in your loss function you assign higher value to these instances.
Hence, the loss becomes a weighted average, where the weight of each sample is specified by class_weight and its corresponding class.


################ Hyperparameters check ###################
* The first parameter to check is the Learning Rate
* Second try another Optimizer
* Third Number of neurons per layer
* Type of activation function
* batch size

+ Evaluation of the model after these m´with the function evaluate(): model.evaluate(x_test, y_test)

############################### Building a Regression MLP Using Sequential API #####################
* activation: 'relu'
* loss: 'mean_squared_error'
* optimizer: 'sgd'

############################### Building Complex Models Using the Functional API #################### #page 402
* This NN architecture connects  all or part of the inputs directly to the output layer. This architecture makes it possibl
for NN to learn both deep patterns (using the deep path) and simple rules (through the short path)

########################## Using Callbacks ###################### 411
    * ModelCheckpoint callback
    * EarlyStopping callback
* The ModelCheckpoint callback saves checkpoints of the model at regular intervals during training, by default at the end of each epoch:
* The EarlyStopping Callback interrupt training when measures no progress on the validation set for a number of epochs ( definded by the patience argument)
* You can combine both callbacks to save checkpoints of your model to yoid wasting time and resources
* For extra controle, you can easily write your own custom callbacks

######################################### Chapter 11. Training Deep Neural Networks ######################################################### 434
####### Vanishing and exploding gradients problems ######### :
* During Training the signal need to flow proprely in both directions: in the forward direction when making prediction, and
in the reverse direction when backpropagating gradients. We dont want the signal to die out, nor do we want it to explode and saturate.
* For the signal to flow properly, the variance of the outputs of each layer should be equal to the varaince of its inputs. It actually not possile
to guarantee both unless the layer hast an equal number of inputs an neurons. But a good compronise that has proven to work very well in pactice:
* The connection weights of each layer must be initialized randomly as described, where fanavg = (fanin + fanout)/2
* By default Keras uses Glorot initialisation with a uniform distribution.

    kernel_initializer="he_uniform"
    kernel_initializer="he_normal"

keras.layers.Dense(10, activation="relu", kernel_initializer="he_normal") #based fanin
he_avg_init = keras.initializers.VarianceScaling(scale=2., mode='fan_avg',distribution='uniform')
keras.layers.Dense(10, activation="sigmoid",kernel_initializer=he_avg_init)

##################### Nosaturating Activation Functions ######################
* ReLU activation function behaves good in DL, mostly because it does not saturate for positives values (and because it s fast to compute)
Unfortunaly, the ReLU activation function is not perfect. It suffers from a problem known as the dying ReLUs:during training, some neurons effedtively "die,".
In some cases, you may find that half of your network's neurons are dead, especially if you used a large learning Rate.

* To solve this problem, you may want to use a variant of the ReLU function, such as the leaky ReLU.
* SELU > ELU > leaky ReLU > ReLU > tanh > logistic
* If you care a lot about runtime latency then you may prefer leaky ReLU with default valuy by Keras 0.3

   keras.layers.LeakyReLU(alpha=0.2),

###################### Batch Normalization ##################
* Anthough using He initialization along with ELU ( or any variant of ReLU) can reduce the danger of the vanishing/exploding gradients probmens at the beginning of training.
it doesn't guarantee that they won't come back during training.
* The Batch Normalization (BN) addresses these problems: The technique consists of adding an operation in the model just before or after the activation function
of each hidden layer. This operation simply zero-centers and normalizes each input, then scales and shifts the result using two new parameter vectors per layer:
One for scaling one for shifting.
* In many Cases, if you add a BN layer as the verry first layer of your NN, you do not need to standardize your training set (e.g. using StandardScaler)

##################### Using TensorBoard for Visualization ########################
#TensorBoard set up
# 1 Create your directory where the logs have to be saved
# 2 Create the tensorBoard callback and specifie the lod directory in it
# 3 call the callback in the fit() method
# 4 Got to the termonal and activate the virtual environement variable: .\venv\Scripts\activate
# 5 Go back to the project directory myproject\mylog
# 6 call: tensorboard --logdir=./my_logs --port=6006
# 7 After this, just change some hyperparameter and train the function again. to see the new logs in Tensorboard, just refresh the webapp

####################### Fine-Tuning neural Network Hyperparametres ###################
How do you know what combination of hyperparameters is the best for your tasks?
* One option is to simply try many combinations of hyperparameters and see which one works best on the validation set ( or use k-fold cross validation)
For example, we can use GridSearchCV or RandomizedSearchCV
**** Number of Neurons per Hidden Layers****
+ Using same number of neurons in all hidden layers perform better
**** Learning Rate *****
* the learning rate ist the most important hyperparameter
* start by 10-5 and gradually increasing it up to 10 in 500 iterations
**** Optimizer *****
**** Batch Size *****
* General use of small batch site 2-32
***** Activation Function *******
* For Hidden lyaers RelU will be good
* For the output layer, it really depends on your task
**** Number of iterations *****
##################################################################################################################################################
##################################################################################################################################################

############################################ CHAPTER 11 ########################################################

########### Reusing Pretrained Layers ############ 453
It is generally not a good idea to train very large DNN from scratch:
instead, you should always try to find an existing neural network that
accomplishes a similar task to the one your trying to tackle.
* This technique is called transfer learning: It not only speed up training considerably, but also require significantly less training data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages