Using Logistic Regression Analysis and Linear Discriminant Analysis to identify the risk factors of Diabetes

Main Article Content

Nazeera Sedeeq kareem Barznji




Many medical studies point out that there is a close relationship between the diagnostic aspects of a disease and some statistical analysis like Logistic Regression Analysis (LRA) and Linear Discriminant Analysis (LDA), both of them are two widely used multivariate statistical methods for data analysis, they are used in order to prediction. In this paper both analyses were discussed and implemented on data with sample size 250 Diabetes patients it collected from Erbil  Layla Qassem Center for Diabetes. The data contained (8) variables, one of them is dependent variable that represents the presence or absence of Diabetes, and the other 7 variables are predictors (Independent variables), they are taken in the model in which they represent risk factors of diabetes disease like: [High Blood Pressure (Hypertension), Family History, Body Mass Index (BMI)-Obesity, Diet (Nutrition), High Lipid in Blood, Physical Activity and Age].

 The paper aims to the comparison between Logistic Regression Analysis and Linear Discriminant Analysis based on several measures of predictive accuracy to choose the best statistical model for identifying the risk factors of diabetes. This paper contains two parts, Theoretical aspects and Practical aspects .The results of every test was done with both analyses ( Logistic Regression Analysis and Linear Discriminant Analysis),reflects to a high ratio of prediction of Logistic Regression Analysis and the result of area under the ROC Curve of all variables, which is used to compare prediction powers of the models, emphasized on that the Logistic Regression Analysis has the best prediction of risk factors of diabetes and it has the appropriate model so Logistic Regression Analysis has emerged as a robust alternative to Linear Discriminant Analysis. By logistic regression the ranking of risk factors on diabetes  is  as follows>

1-Family History 2-(BMI)(Obesity rate),3-HighLipid in Blood,4-Physical Activity ,5-Hypertension, 6- Diet (Nutrition)=2.033, but (age) it is not represents the risk factor

