Digga.txt

{'title': 'What is the difference between linear regression and logistic regression?',
 'question': {'text': '\nWhen we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values.\nThen, what is the difference between the two methodologies?\n',
  'comments': []},
 'answers': [{'text': "\n\nLinear regression output as probabilities\nIt's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. As regression might actually\nproduce probabilities that could be less than 0, or even bigger than\n1, logistic regression was introduced. \nSource: http://gerardnico.com/wiki/data_mining/simple_logistic_regression\n\nOutcome\nIn linear regression, the outcome (dependent variable) is continuous.\nIt can have any one of an infinite number of possible values. \nIn logistic regression, the outcome (dependent variable) has only a limited number of possible values. \nThe dependent variable\nLogistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue,\n1st/2nd/3rd/4th, etc.  \nLinear regression is used when your response variable is continuous. For instance, weight, height, number of hours,  etc.\nEquation\nLinear regression gives an equation which is of the form Y = mX + C,\nmeans equation with degree 1. \nHowever, logistic regression gives an equation which is of the form \nY = eX + e-X\nCoefficient interpretation\nIn linear regression, the coefficient interpretation of independent variables are quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx). \nHowever, in logistic regression, depends on the family (binomial, Poisson,\netc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different. \nError minimization technique\nLinear regression uses ordinary least squares method to minimise the\nerrors and arrive at a best possible fit, while logistic regression\nuses maximum likelihood method to arrive at the solution.\nLinear regression is usually solved by minimizing the least squares error of the model to the data, therefore large errors are penalized quadratically. \nLogistic regression is just the opposite. Using the logistic loss function causes large errors to be penalized to an asymptotically constant.\nConsider linear regression on categorical {0, 1} outcomes to see why this is a problem. If your model predicts the outcome is 38, when the truth is 1, you've lost nothing. Linear regression would try to reduce that 38, logistic wouldn't (as much)2.\n\n",
   'comments': ['Is there a difference between Y = e^X/1 + e^-X and Y = e^X + e^-X ?',
    'e^X/1 ? anything divide by 1 is the same. so there is no difference. I am sure you were meaning to ask something else.',
    'I know this is an old thread but given your statement "Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc. "; what\'s the difference between this and classification then?',
    '@kingJulian Logistic regression is indeed used for classification. Check this out, you might find it useful as I have',
    '@kingJulian: Logistic regression is a classification technique and classification stands for several algorithms that try to predict few outcomes.']},
  {'text': '\nIn linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values.\nFor instance, if X contains the area in square feet of houses, and Y contains the corresponding sale price of those houses, you could use linear regression to predict selling price as a function of house size. While the possible selling price may not actually be any, there are so many possible values that a linear regression model would be chosen.\nIf, instead, you wanted to predict, based on size, whether a house would sell for more than $200K, you would use logistic regression. The possible outputs are either Yes, the house will sell for more than $200K, or No, the house will not.\n',
   'comments': ['In andrews logistic regression example of cancer, I can draw a horizontal line y=.5, (which obviously passes through  y=.5 ), ten if any point is above this line y=.5 => +ve , else -ve. So then why do I need a logistic regression. Im just trying to understand  the best case explanation to use logistic regression ?',
    '@vinita: here or here is a simple example  for not using linear regression and then thresh holding, for classification problems.',
    'logistic regression is the better classifier on categorical data than linear regression. It uses a cross-entropy error function instead of least squares. Therfore it isn\'t that sensitify to outliers and also doesn\'t punish "too correct" data points like least-squares does.']},
  {'text': "\nJust to add on the previous answers. \nLinear regression \nIs meant to resolve the problem of predicting/estimating the output value for a given element X (say f(x)). The result of the prediction is a cotinuous function where the values may be positive or negative. In this case you normally have an input dataset with a lot of examples and the the output value for each one of them. The goal is to be able to fit a model to this data set so you are able to predict that output for new different/never seen elements. Following is the classical example of fitting a line to set of points, but in general linear regression could be used to fit more complex models (using higher polynomial degrees):\n\nResolving the problem \nLinea regression can be solved in two different ways: \n\nNormal equation (direct way to solve the problem)\nGradient descent (Iterative approach)\n\nLogistic regression\nIs meant to resolve classification problems where given an element you have to classify the same in N categories. Typical examples are for example given a mail to classify it as spam or not, or given a vehicle find to wich category it belongs (car, truck, van, etc ..). That's basically the output is a finite set of descrete values.  \nResolving the problem \nLogistic regression problems could be resolved only by using Gradient descent. The formulation in general is very similar to linear regression the only difference is the usage of different hypothesis function. In linear regression the hypothesis has the form: \n\nwhere theta is the model we are trying to fit and [1, x_1, x_2, ..] is the input vector. In logistic regression the hypothesis function is different: \n\n\nThis function has a nice property, basically it maps any value to the range [0,1] which is appropiate to handle propababilities during the classificatin. For example in case of a binary classification g(X) could be interpreted as the probability to belong to the positive class. In this case normally you have different classes that are separated with a decision boundary which basically a curve that decides the separation between the different classes. Following is an example of dataset separated in two classes.\n\n",
   'comments': []},
  {'text': '\nThey are both quite similar in solving for the solution, but as others have said, one (Logistic Regression) is for predicting a category "fit" (Y/N or 1/0), and the other (Linear Regression) is for predicting a value.\nSo if you want to predict if you have cancer Y/N (or a probability) - use logistic.  If you want to know how many years you will live to - use Linear Regression !\n',
   'comments': []},
  {'text': '\nThe basic difference :\nLinear regression is basically a regression model which means its will give a non discreet/continuous output of a function. So this approach gives the value. For example : given x what is f(x)\nFor example given a training set of different factors and the price of a property after training we can provide the required factors to determine what will be the property price.\nLogistic regression is basically a binary classification algorithm which means that here there will be discreet valued output for the function . For example : for a given x if f(x)>threshold classify it to be 1 else classify it to be 0.\nFor example given a set of brain tumour size as training data we can use the size as input to determine whether its a benine or malignant tumour. Therefore here the output is discreet either 0 or 1.\n*here the function is basically the hypothesis function\n',
   'comments': []},
  {'text': "\nSimply put, linear regression is a regression algorithm, which outpus a possible continous and infinite value; logistic regression is considered as a binary classifier algorithm, which outputs the 'probability' of the input belonging to a label (0 or 1).\n",
   'comments': ['Thank goodness I read your note about probability. Was about to write off logistic as a binary classifier.']},
  {'text': '\nRegression means continuous variable, Linear means there is linear relation between y and x. \nEx= You are trying to predict salary from no of years of experience. So here salary is independent variable(y) and yrs of experience is dependent variable(x).\ny=b0+ b1*x1\n\nWe are trying to find optimum value of constant b0 and b1 which will give us best fitting line for your observation data.\nIt is a equation of line which gives continuous value from x=0 to very large value.\nThis line is called Linear regression model.\nLogistic regression is type of classification technique. Dnt be misled by term regression. Here we predict whether y=0 or 1.\nHere we first need to find p(y=1) (wprobability of y=1) given x from formuale below.\n\nProbaibility p is related to y by below formuale\n\nEx=we can make classification of tumour having more than 50% chance of having cancer  as 1 and tumour having less than 50% chance of having cancer as 0.\n\nHere red point will be predicted as 0 whereas green point will be predicted as 1.\n',
   'comments': []},
  {'text': '\nIn short:\nLinear Regression gives continuous output. i.e. any value between a range of values.\nLogistic Regression gives discrete output. i.e. Yes/No, 0/1 kind of outputs.\n',
   'comments': []},
  {'text': '\nCannot agree more with the above comments. \nAbove that, there are some more differences like\nIn Linear Regression, residuals are assumed to be normally distributed. \nIn Logistic Regression, residuals need to be independent but not normally distributed. \nLinear Regression assumes that a constant change in the value of the explanatory variable results in constant change in the response variable. \nThis assumption does not hold if the value of the response variable represents a probability (in Logistic Regression)\nGLM(Generalized linear models) does not assume a linear relationship between dependent and independent variables. However, it assumes a linear relationship between link function and independent variables in logit model.\n',
   'comments': []},
  {'text': '\n\n', 'comments': []},
  {'text': '\nTo put it simply, if in linear regression model more test cases arrive which are far away from the threshold(say =0.5)for a prediction of y=1 and y=0. Then in that case the hypothesis will change and become worse.Therefore linear regression model is not used for classification problem.\nAnother Problem is that if the classification is y=0 and y=1, h(x) can be > 1 or < 0.So we use Logistic regression were 0<=h(x)<=1.\n',
   'comments': []},
  {'text': '\nLogistic Regression is used in predicting categorical outputs like Yes/No, Low/Medium/High etc. You have basically 2 types of logistic regression Binary Logistic Regression (Yes/No, Approved/Disapproved) or Multi-class Logistic regression (Low/Medium/High, digits from 0-9 etc)\nOn the other hand, linear regression is if your dependent variable (y) is continuous. \ny = mx + c is a simple linear regression equation (m = slope and c is the y-intercept). Multilinear regression has more than 1 independent variable (x1,x2,x3 ... etc) \n',
   'comments': []},
  {'text': '\nIn linear regression the outcome is continuous whereas in logistic regression, the outcome has only a limited number of possible values(discrete).\nexample:\nIn a scenario,the given value of x is size of a plot in square feet then predicting y ie rate of the plot comes under linear regression. \nIf, instead, you wanted to predict, based on size, whether the plot would sell for more than 300000 Rs, you would use logistic regression. The possible outputs are either Yes, the plot will sell for more than 300000 Rs, or No.\n',
   'comments': []}]}