Connect and share knowledge within a single location that is structured and easy to search. But when I check their skewness using library e in R as seen below , I found out that the skewness of income is not that high or low. My question is how do I determine if I need to used log transformation in a regression model? Income is commonly accepted be right skewed, where people making disproportionately large amounts of income pull the mean much higher than the median.
Its skewness in that particular data set may not contradict the prevailing norm. Using log income also lowers the impact of heteroskedasticity. However this is not the best use of it, if heteroskedasticity is a problem you may want to use GLS. This is a question of your theory or functional form. Is a dollar the same for a millionare and for a pauper? Choose linear in this case. If a dollar does nothing for a millionaire but a lot for a pauper, choose ln.
If your dependent variable is also in logs, your coefficient is an elasticity, a very important concept in economics. A coefficient is still called a semielasticity even if the independent variable is in logs and the dependent variable is not! Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. This changes the assumptions about the distance between the zeroth and first dollars of income and the distance between the 10,th and 10,st dollars of income.
In the graph on the right, logging the per capita GDP gives us a scale that is far more sensitive to differences when integers are small than when they are large. That difference between having no per capita GDP and having just one dollar of per capita GDP, or between one dollar and ten dollars has a relatively greater impact than the difference between 10, and 10, or between 10, and 10, Logged values are sensitive to differences in orders of magnitude.
There is an order of magnitude change between 1 and 10, then not again until we get to , not again until we get to , and not again until we get to 10, The distance between each of these milestones grows successively larger. Why do they tend to produce better fit lines for per capita income level data than the linear scale does?
That is quite meaningful. Now you are able to take the subway, get something to eat, and make a call at a pay phone, three things you would not have been able to do when you had nothing.
The point here is that when folks have no income, they are a lot more sensitive to small changes in income than they are when they have a measurable income. The more income they have, the less sensitive they are to small or even moderate changes in income.
This is why economists and quantitative social scientists almost always log measures of income. The assumptions I just explained are almost always true.
The model fits better when per capita GDP is logged and it appears that there may be a positive relationship between money and happiness after all.
These happiness measures are rather uninspiring. Jonathan Jonathan 31 1 1 gold badge 1 1 silver badge 2 2 bronze badges. Add a comment. Active Oldest Votes. Improve this answer. Nick Cox Peter Flom Peter Flom That's multiplying by 1. Andy Andy Featured on Meta. Now live: A fully responsive profile. Version labels for answers. Linked 5. Related
0コメント