GPA prediction of college student

Question

I have a dataset consist of 8 columns and 15600 rows with the following columns:-

1.Entry_academic_year which have 5 discrete value (2558,2559,2560,2561,2562)
2.Faculty (It is the faculty that student has taken like engineering)
3.branch  (It is the branch  that student has taken like software engineering)
4.Admission type (how the student enter the college)
5.Graduated_high_school (it is the high school where student got graduated)
6.province_of_school
7.GPA_high_school(It is the GPA of student in high school)
8.GPA_college(It is the GPA of the student during college)

I am trying to predict the GPA of the student at the college by dividing the GPA into 4 quartiles with respect to percentile (25,50,75), The problem I faced is that the Graduated_high_school columns have around 1732 unique value with some school contain only one row which makes the prediction around 30-35 % accuracy
Any idea on how to fix it?

Daren · Answer

Perhaps you can see if Graduated_high_school is correlated in any way to GPA_college? If there is no correlation, you can try to fit a model by dropping the Graduated_high_school column.
Else, you can try to drop rows belonging to under-represented high schools. However, one problem I foresee is that future predictions might have Graduated_high_school that are unseen in the training dataset, leading to problems (e.g. schools that weren't mentioned in the dataset, or if someone  decides to use your model, on a dataset from another country). So, if the Graduated_high_school is not important, I would consider dropping it altogether.
Or, maybe you can change Graduated_high_school to something else that is related, such as number of teachers in high school, teacher-student ratio etc.

GPA prediction of college student

One Answer

Add your own answers!

Ask a Question