06 января 2016г.
Description of the model.
Among the G-7 countries, the seven richest countries in the world, Euro-zone countries are suffering from high and persistent unemployment. Then, what is the situation in Japan? In the 1980s, with the highest economic growth and the lowest jobless rate among the major industrial countries, Japan was shining, claiming ―Japan as Number One‖ or ―Model Japan.‖ After the burst of the bubble boom in 1990, the era of ―a lost decade‖ began, in which the growth rate was nearly zero and in some years below zero. In 1997, Japan was on the verge of a great depression, caused by financial crisis that led to a series of financial institutions‘ bankruptcies, including a big city bank and one of the three biggest brokerage houses. Quite naturally, many Japanese were seriously afraid of losing jobs.
So the project aims to find reasons for the increasing unemployment rate in Japan. In order to do that we will create econometric model and find out what factors influence on unemployment rate.
Required data for estimation.
For analyzing and testing the model, we found out some specific data. We took annual data from 1980 to 2011 about unemployment rate, GDP, GDP per capita, output gap, rate of inflation, employment, population, general government net debt and current account balance in Japan (See Appendix 1).
In order to evaluate the model we used data that was collected by International Monetary Fund and thus is in public domain (http://www.imf.org).
Matrix of correlation
The correlation coefficients between dependent variable unemployment rate and independent variables are close to 1which implies high correlation between these variables. In order to find out the type of the relationship between dependent variable Y1 and independent variables we will construct scatter diagrams.
Scatter diagram
A line of best fit can be drawn in order to study the correlation between the variables. An equation for the correlation between the variables can be determined by the best-fit procedures. For a linear correlation, the best-fit procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time.
So our next step is to draw scatter diagrams using Excel and find out whether there is correlation between unemployment rate and independent variables.
It shows that there is positive correlation between these two variables. The equation for the correlation is Y=6307,2x + 7047,7.
A negative correlation between unemployment rate and output gap is slight. The equation for the correlation between the variables is Y=-0,47x + 0,30, R2 is 0,05, that is very small. Therefore we should exclude this variable from our model
The correlation between unemployment rate and inflation is negative. The equation for the correlation between the variables is Y = -1,27x + 5,49. R2 (0,57) is more than 50%, therefore we can use this dependence to construct our model.
Correlation between unemployment rate and employment is positive, but it is weak. The equation for the correlation between the variables is Y = 1,44x + 57,089.
There is a strong positive correlation between unemployment rate and population: Y = 2,32x + 116,37, R2 is relatively high and equals 0,59.
Correlation between unemployment rate and general government net debt is also positive: Y = 25,98x – 43,35, R2 is 0,76.
That positive correlation between unemployment rate and current account balance exists. The equation is Y = 34,68x – 19,24, but R2 (0,47) is not so high.
These seven scatter diagrams reflets existence of linear relationship between the variables, but the relationship between unemployment rate and output gap is rather weak, therefore we exclude this independent variable. Now we can pass to the next stage of our project – econometric model testing.
Econometric model Model specification
Mathematical interpretation of the model is presented below:
Where Yt is the unemployment rate in percent of total labor force. X2t is GDP per capita U.S. dollars, X4t – inflation in percentage, X5t – employment in millions of people, X6t – population in millions of people, X7t – general government net debt – in percentage and X8t – current account balance in billions U.S. dollars. β0,2,4,5,6,7,8 are parameters (sensitivity of the explained variable to changes of the explainable variable), ut – the disturbance term.
E (ut) = 0 is the first Gauss-Markov assumption that the error u has an expected value of zero given value of the explanatory variable. This means that on average the errors balance out. This is not a restrictive assumption since we can always use β0 so that this equation holds.
The second condition is that the error term has a constant variance. This is the assumption of homoscedasticity.
In order to estimate the econometric model we should use software (Excel). We input the values of endogenous and exogenous variables from 1980 to 2010 into corresponding rows in the Regression, Analysis Toolpak. We should mention that we do not use data of 2011 year, because we are going to use it later to check model adequacy. Level of significance is 95%.
We got the following results:
Table 1 Regression statistics
Регрессионная статистика
|
Множественный R
|
0,95
|
R-квадрат
|
0,91
|
Нормированный R-квадрат
|
0,88
|
Стандартная ошибка
|
0,38
|
Наблюдения
|
31
|
It contains values of regression coefficients with αt (95%) confidence probability and their statistical assessment. Specification of estimated econometric model:
R
2 is high (88%). It means that varies in X explains 88% of varies in Y. Fcrit. is less than F, therefore
R
2 is not random and quality
of specification of econometric model.
However when
checking the significance of the coefficients with the help of t-test,
we found out that X2t (GDP
per capita) did not pass
it, because |t| ≤ tcrit. In this case we should
exclude it and reestimate the whole model. Reestimation of the model and regression analysis without
X2t is presented
in Appendix
2.
While checking the significance of the coefficients with the help of t-test,
we found out that X5t (employment) did not pass it, because |t| ≤ tcrit. So we exclude this variable and reestimate the model once again.
Specification of a new econometrics model
Mathematical interpretation of the model is presented below:
Where β0 = -13,69 with
standard error of 5,91, β4 = -0,20
with standard error of 0,06,
β6 = 0,14 with standard error of 0,05, β7 = 0,03 with standard
error of 0,003, β8 = -0,01 with standard error of 0,003 the standard error of disturbance term is 0,37.
R2 is high (89%). It means that varies in X explains 88% of varies in Y. Fcrit. is less than F, therefore
R2 is not random and quality
of specification of econometric model.
Model testing
Now we should
test our model. The calculated regression
coefficients β0, β4, β6, β7, β8, allow us to construct the equation Yt = -13,69
– 0,20X4t +0,14X6 +0,03X7t – 0,01X8t + ut,
where ut is random
value.
Value of multiple coefficient of determination R2 equals to 0,89. It shows that 89% of total deviation of Yt is
explained by the variation of the factors X4t, X6t. X7t and X8t. Such value
of
R2 is good, as it is close to 1. It
means that the selected factors do not effect significantly our model, which confirms the correctness of the inclusion in the estimated model.
Significance F
The calculated level of significance 0,000000000001<0,05 (see table 6) confirms
the R2 significance. F-test
This is another way of checking R2. It is based on comparing
F with Fcrit. F should be more than Fcrit. In our case
Fcrit= FРАСПОБР(0,05;4;26) = 2,74, where 4 is the number
of degrees of freedom, it equals to the number
of the equation regresses
m=4, and 26 is the number of degrees of freedom, it equals to n-(m+1).
As our F > Fcrit., the H0 hypothesis that R2=0 is rejected. That means that R2 is not random and quality of specification of our econometrics model is high.
Standard error
Now we should test the importance of regression coefficients β0, β4, β6, β7, β8. Comparing the elements of the columns Coefficients and Standard
Error , we can say that absolute
values of standard
errors is less than the corresponding values
of coefficients, so, at the first stage of analysis, all the variables
should remain in the model.
t-test
That is to test the inequality |t|≥tcrit., where t is the value of t-statistics. If the inequality is right, the coefficient and the regressor are considered to be significant and vice versa.
In our case tcrit. = СТЬЮДРАСПОБР(0,05; 26) = 2,06, where
0,05 is the level of significance, 26 is
the number of degrees of freedom,
it equals to n-(m+1).
After reestimation of the initial model all absolute
values of t-statistics are more than tcrit., therefore, all the regression coefficients are significant.
Goldfield-Quandt test
It could be concluded
that random disturbances are hetescedactic. This leads to loss of unbiased property
of estimation
of the parameters of the linear regression model obtained by the method
of least squares,
and the accuracy inadequacy of the characteristics of these estimations.
In our example both inequalities are valid, so the assumption about homoscedasticity of random disturbance is adequate.
Durbin-Watson test
This test is designed
to check a particular case of third assumption of the Gauss-Markov theorem
about the absence of autocorrelation between
adjacent random residuals
in the model.
Using values of the residuals ut from appendix 3, we can compute
Durbin-Watson statistics.
Then we should find Durbin-Watson statistics critical values dL and dU with
the help of special statistical table, where n=31 – total number
of observations, k=4 – total number of factors,
α = 0,05.
In our model dL = 1,16 and dU = 1,74, so there is positive autocorrelation of the model‘s residuals, so we cannot use least square technique to estimate
the model.
The major reason why autocorrelation occurs is because of the inertia or sluggishness that is present in time series data.
The occurrence of the non
stationary
property
in time
series
data
also
gives rise
to
the phenomenon
of autocorrelation. Thus, to make
the
time series almost
free of the problem of autocorrelation, the researcher should
always make the data stationary.
The major consequence of using ordinary
least square (OLS) in the presence of autocorrelation is that it will simply make the
estimator inefficient. As a
result, the hypothesis testing
procedures will give inaccurate results due to the presence
of autocorrelation.
In our case in order to get rid of autocorrelation we should add one more coefficient – unemployment rate in the previous
period (Yt-1). The new data for analysis
is presented in Appendix
4.
Specification of the econometrics model
Mathematical interpretation of the model is presented below:
Where Yt is the unemployment rate in percent of total
labor force. X4t – inflation in percentage, X6t – population in millions of people, X7t – general government net debt – in percentage, X8t – current
account balance in billions
U.S. dollars, Yt-1 – unemployment rate in the previous period in percentage. β0,4,6,7,8,9 are parameters (sensitivity of the explained variable
to changes of the explainable variable), ut – the disturbance term. For the new reestimated model we take data from appendix
4 and we get the following results:
Table 3 Regression statistics
Регрессионная статистика
|
Множественный R
|
0,98
|
R-квадрат
|
0,96
|
Нормированный R-квадрат
|
0,95
|
Стандартная ошибка
|
0,25
|
Наблюдения
|
30
|
Specification of reestimated econometric model
The specification of our model with calculated parameters are presented
below:
Where β0 = -7,70 with standard error of 4,11, β4 = -0,20 with standard error of 0,05, β6 = 0,08 with standard error of 0,03, β7 = 0,01 with standard
error of 0,003, β8 = -0,01 with standard error of 0,002, β9 = 0,50 with standard error of 0,11, the standard
error of disturbance term is 0,25.
R2 is high (95%). It means that varies in X explains 88% of varies in Y. Fcrit. is less than F, therefore
R2 is not random and quality
of specification of econometric model.
Model testing
Now we test our model.
The calculated regression coefficients β0, β4, β6, β7, β8, β9 allow
us to construct the equation Yt = -7,70 – 0,20X4t +0,08X6 +0,01X7t – 0,01X8t + 0,50X9t-1 + ut,
where ut is random value.
Value of multiple coefficient of determination R2 equals to 0,95. It shows that 95% of total deviation of Yt is explained by the variation
of the factors X4t, X6t,
X7, X8t.and
X9t-1. Such value of R2 is better than in the previous
model as it is closer to 1.
Significance F
The calculated level of significance 6,78354E-16 <0,05 confirms the R2 significance. F-test
In our case Fcrit= FРАСПОБР(0,05;5;24) = 2,62, where 5 is the number of degrees of freedom,
it equals to the number of
the
equation regresses m=5, and
24 is the number of degrees of freedom,
it equals to n-(m+1).
As our F > Fcrit., the H0 hypothesis that R2=0
is rejected. That means that R2 is not random and quality of specification of our econometrics model is high.
Standard error
Now we should test the importance
of regression coefficients β0, β4, β6, β7, β8, β9. Comparing the elements of the columns Coefficients and Standard Error (table 11), we can say that absolute
values of standard errors is less than the corresponding values
of coefficients, so, at the first stage of analysis,
all the variables
should remain in the model.
t-test
Then we should
check the significance of the coefficients with the help of t-test.
That is to test the inequality |t|≥tcrit., where t is the value of t-statistics. If the inequality is right, the coefficient and the regressor are considered to be significant and vice versa.
In our case tcrit. = СТЬЮДРАСПОБР(0,05; 24) = 2,06, where
0,05 is the level of
significance, 24 is the number of degrees of freedom,
it equals to n-(m+1).
After reestimation
of the initial model all absolute values of t-statistics are more than tcrit., therefore, all the regression coefficients are significant.
Goldfield-Quandt test
In our example both inequalities are valid, so the assumption about homoscedasticity of random disturbance is adequate.
Durbin-Watson test
Using values of the residuals ut from appendix 5, we can compute
Durbin-Watson statistics.
Then we should find Durbin-Watson statistics critical values dL and dU with
the help of special statistical table, where n=29 – total number
of observations, k=5 – total number of factors,
α = 0,05.
In our model dL = 1,07 and dU = 1,83, so there is
no
information about autocorrelation of the model‘s
residuals, so we can
use least square technique to estimate
the model.
Confidence interval
The
purpose of confidence of intervals
is to
determine a series
of values from recurring samples of data so that
the series of values of the specific parameter is more likely to happen within the specified
probability.
We also should estimate the lower and upper boundaries for each year. We will use the following formula: 99,5%
boundary = Yt ± tcrit. *
st.error ,
where tcrit. is calculated as it has been shown in part ―t-test‖
and standard error =0,25, Yˆ – predicted value of Yt. Then we should compare the empirical data for each data with the resulted interval boundaries.
Low level = 4,55 Upper level = 5,56 Adequacy checking
Now we should check whether predicted
by
our model Yˆ is truly describes the empirical data correctly and test the forecasting capabilities of the model.
Yt should lie within
confidence interval,
predicted by our model.
If we look at our case, we will get the following information:
Table 4 Adequacy checking
Lower 95%
|
Upper 95%
|
Empirical
|
Empirical>Lower
|
Empirical
|
4,55
|
5,56
|
4,57
|
True
|
True
|
So our empirical
for 4,57 of 2011 data lies between upper and lower boundaries predicted by our model. The formula
for error in prediction is:
So the error in the prediction
equals 10,5%, therefore we can forecast the future correctly
and accurately with the confidence of 10,5%.
Conclusion.
In the project
we were trying to construct
an econometric model of unemployment and find out what factors influence it. We used statistics information on different macroeconomics indicators of Japan, including unemployment rate, GDP, GDP per capita, population, current account balance, general government net debt, employment, inflation and output gap. While checking
our model we conclude
that there is strong
correlation (1,00) between two independent variables GDP and GDP per capita, so we excluded
one of them. Some of our variables
did not pass the t-test, namely output gap, employment and GDP per capita.
It means that they are not significant in our estimation.
However on the stage of model testing we faced the
problem of autocorrelation of residuals, so in order to solve it we add one more variable
– unemployment
rate in the previous
period.
Thus the general form model that can be used in practice
looks as follows:
Yt = a0 + a4X4t + a6X6t+ a7X7t+ a8X8t + a9X9t-1
Where Yt is the unemployment rate in percent of total labor force. X4t is rate of inflation, X5t – employment, X6t – population, X7t – general government
net debt, X8t – current account balance,
X9t-1 – unemployment
rate in the previous rate. a0, a4, a6, a7, a8 and a9 are parameters.
From the estimated econometric model we can interpret the coefficients:
· If inflation increases by 1%, the unemployment rate will decrease by 0,20%;
· If population increases by 1 million
people, the unemployment rate will increase by 0,08%;
· If general government net debt increases
by 1%, the unemployment rate will increase by 0,01%;
· If current account
balance increases by 1 billion U.S. dollars, the unemployment rate will decrease by 0,01%;
· If unemployment rate in the previous
period increases by 1%, the current unemployment rate will increase
by 0,5%.
In conclusion we would like to say that the relationship between changes in unemployment rate and above listed macroeconomics indicators can be useful to policymakers and economists. Our model is of good explanatory ability and can be used for general
data analysis and forecasting.
List of references
1.
Ball, L., 1999, Aggregate
demand and long-run unemployment, Brooking Papers on Economic Activity 2, 189- 251.
2.
Fuess, S.M., 2006, Working
hours in Japan:
Who is time-privileged, IZA Discussion Paper 2195, Bonn.
3.
Nickell, S., L. Nunziata and W. Ochel, 2005, Unemployment in the OECD since the 1960s: What do we know?, The Economic Journal, 115, 1-27.
4.
Phelps, E.S., 1997b,
Discussion to Edmund
Phelp‘s theory of structural slumps, in: D.J. Snower
and G. de la Dehesa,
eds., Unemployment Policy
(Cambridge University Press), 142-150.
5.
Woodford, M., 1994, Structural slumps, Journal of Economic Literature, 32 (4), 1784-1815.
6.
http://www.imf.org