Using Logistic Regression Analysis and Linear Discriminant Analysis to identify the risk factors of Diabetes

  • Nazeera Sedeeq kareem Barznji Department of Statistics / College Administration and Economic / Salahaddin University-Erbil
Keywords: logistic regression, Maximum Likelihood


Many medical studies point out that there is a close relationship between the diagnostic aspects of a disease and some statistical analysis like Logistic Regression Analysis (LRA) and Linear Discriminant Analysis (LDA), both of them are two widely used multivariate statistical methods for data analysis, they are used in order to prediction. In this paper both analyses were discussed and implemented on data with sample size 250 Diabetes patients it collected from Erbil  Layla Qassem Center for Diabetes. The data contained (8) variables, one of them is dependent variable that represents the presence or absence of Diabetes, and the other 7 variables are predictors (Independent variables), they are taken in the model in which they represent risk factors of diabetes disease like: [High Blood Pressure (Hypertension), Family History, Body Mass Index (BMI)-Obesity, Diet (Nutrition), High Lipid in Blood, Physical Activity and Age].

 The paper aims to the comparison between Logistic Regression Analysis and Linear Discriminant Analysis based on several measures of predictive accuracy to choose the best statistical model for identifying the risk factors of diabetes. This paper contains two parts, Theoretical aspects and Practical aspects .The results of every test was done with both analyses ( Logistic Regression Analysis and Linear Discriminant Analysis),reflects to a high ratio of prediction of Logistic Regression Analysis and the result of area under the ROC Curve of all variables, which is used to compare prediction powers of the models, emphasized on that the Logistic Regression Analysis has the best prediction of risk factors of diabetes and it has the appropriate model so Logistic Regression Analysis has emerged as a robust alternative to Linear Discriminant Analysis. By logistic regression the ranking of risk factors on diabetes  is  as follows>

1-Family History 2-(BMI)(Obesity rate),3-HighLipid in Blood,4-Physical Activity ,5-Hypertension, 6- Diet (Nutrition)=2.033, but (age) it is not represents the risk factor


[1] Adeeb A. Ali AL Rahamneh , Omar M. Hawamdeh(2017)' The Factors Affecting Eye Patients (Cataract) In Jordan by Using the Logistic Regression Model' Modern Applied Science; Vol. 11, No. 8; 2017 ISSN 1913-1844 E-ISSN 1913-1852 Published by Canadian Center of Science and Education
[2]Ali, A., & Mohammad, A. (2010) 'The Logistic Regression Model to Breast Cancer Among Sudanese Women' Phd Thesis, Faculty of Science, Sudan University of Science and Technology
[3] Cox, D.R. and E.J. Snell, (1989) 'The Analysis of Binary Data' 2nd edition. Chapman and Hall, London, pp. 144-163.
[4] Dr.P.Venkatesh (2012) 'Discriminant Analysis: An Overview Scientist' Division of Agricultural Economics, IARI
[5]Elizabeth Brown(2004)'Lecture 13 Estimation and hypothesis testing for logistic regression' BIOST 515 February 19, 2004
[6] Hassan, & Ahmad, F. (2014) 'The Use of the Logistic Model to Determine the Factors Affecting the Incidence of Anemia (Anemia) In Children.' Master Thesis, Faculty of Science, Sudan University of Science and Technology.
[7] Hauck, W.W. and A. Donner. (1977)'Wald’s Test as applied to hypotheses in logit analysis' Journal of the American Statistical Association, vol. 72, pp. 851-853.
[8] Hosmer, David W.; Lemeshow, Stanley (2013).' Applied Logistic Regression' New York: Wiley. ISBN 978-0-470-58247-3.
[9] Hosmer, David W, Scott Taber, and Stanley Lemeshow(1991) 'The importance of Assessing the Fit of Logistic Regression Models: A Case Study'. American Journal of Public Health , vol. 81, pp. 30-35.
[10]Karl L. Wuensch and Poteat (2016) 'Binary Logistic Regression with SPSS' published in the Journal of Social Behavior and Personality, 13, 139-150.
[11]Lazha A. Talat Shareef (2010 )'Prevalence and risk factors of diabetic retinopathy among a sample of diabetic pregnant women in Erbil city'Higher Diploma, Ophthalmology
[12] Maja Pohar, Mateja Blas, and Sandra Turk (2004) ' Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation Study' Metodološki zvezki, Vol. 1, No. 1, 143-161
[13]Majed F Al. (2012)' A Comparative Study Between Linear Discriminant Analysis and Multinomial Logistic Regression in Classification and Predictive Modeling ' A Thesis Submitted in Partial Fulfillment of Requirements for the Degree of M.Sc. of Applied Statistics in Al Azhar University Gaza Faculty of Economics and Administrative Science
[14] Nagelkerke, N.J.D.( 1991 )'A note on the general definition of the coefficient of determination'. Journal of Biometrika, vol. 78, pp. 691-692.
[15] Nidal Mohamed Mustafa Abd Elsalam(2013)) 'Binary Logistic Regression to Identify the Risk Factors of Eye Glaucoma ' Department of Statistics & Computation Faculty of Technology of Mathematical Sciences & Statistics, Al Neelain University International Journal of Sciences, Basic and Applied Research (IJSBAR) ISSN 2307-4531 (Print & Online) BasicAndApplied
[16]Osibanjo F. S., 2Olalude G. A. 2Akintunde M. O.and 2Ajala A. G.( 2015) 'Application of Logistic Regression model to Admission Decision of foundation Program at University of Lagos'. International Journal of Mathematics and Statistics Studies Vol.3, No.4, pp.27-41, July
[17] Parastoo Rahimloo, Ahmad Jafarian (2016)' Prediction of Diabetes by Using Artificial Neural Network, Logistic Regression Statistical Model and Combination of Them'Bulletin de la Société Royale des Sciences de Liège, Vol. 85, 2016, p. 1148 - 1164
[18] Pickup, J.C., Williams G., (2003)' Textbook of diabetes'Blackwell Science, Oxford,
[19]Sarwar N, Gao P, Seshasai SR, Gobin R, Kaptoge S. (2010 )' Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies'Emerging Risk Factors Collaboration 26; 375:2215-2222
[20]S. James Press; Sandra Wilson, (1978)' Choosing Between Logistic Regression and Discriminant Analysis ' Journal of the American Statistical Association, Vol. 73, No. 364., pp. 699-705.
[21] Szumilas Magdalena ( 2010 ) ' Explaining Odds Ratios ' T J Can Acad Child Adolesc Psychiatry.; 19(3): 227–229. MSc
[22]Warrick Junsuk Kim(2016)' Global report on Diabetes World Health Organization (WHO) 'website ( or can be purchased from WHO Press, ISBN 978 92 4 156525 7 (NLM classification: WK 810)
[23] Wen-Chih Wu1,2,,Mary E. Lacy,1,2 Gregory A. Wellenius,1Mercedes R. Carnethon, Eric B. Loucks, Xi Luo (2016)'Racial Differences in the Performance of Existing Risk Prediction Models for Incident Type 2 Diabetes' The Caraia Study
39:285–291 | DOI: 10.2337/dc15-0509
[24] Zhang Biao(2006 ) 'A score test under logistic regression models based on case–control data' first published: 12 October
How to Cite
kareem Barznji N. Using Logistic Regression Analysis and Linear Discriminant Analysis to identify the risk factors of Diabetes. JAHS [Internet]. 27Dec.2018 [cited 10Jul.2020];22(6):248 -268. Available from: