For this project, i use medical appointment No-shows provided on kaggle, which is the information collected from medical appointments in Brazil. The dataset has 110527 samples with 14 columns.
Note: It says 'No' if the patient show up and 'Yes' if the patient does'nt show up.
- are non alcoholic patients more likely to show at their appointment?
- could gender be related to showing up in the appointment?
- can patients with diabetes also show up in their appointment?
#import useful packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
#load the dataset
df = pd.read_csv('No_show_appointment.csv')
#view first 5 rows
df.head()
df.shape
df.describe()
The above table shows a description of the whole dataset, by counting the samples, showing each column mean, minimum value, and also maximum value as well.
df.info()
The above information clearly shows that there are no missing values in the dataset
In this dataset 'ScheduledDays' and 'AppointmentDay' datatypes are in a form of Object(string), therefore converting these columns to datetime will allow date and time manipulations.
Also in the dataset the 'Age' column have a value of -1, therefore the row with such data will be dropped.
Finaly, in the No-show column the values currently 'No' means the patient has shown up, and 'Yes' means the patient has'nt shown up in the appointment.
things to be done in this section include;
#the rows of age == -1 are dropped
df.drop(df[df['Age'] == -1].index, inplace =True)
Droping column where Age == -1
#unnecessary coulumns are dropped
df.drop([ 'Neighbourhood','ScheduledDay','AppointmentDay','SMS_received','Scholarship', 'Handcap', 'Age'],axis=1,inplace=True)
#the description is checked to confirm the dropped row
df.describe()
#check if the rows are dropped
df.info()
Replacing values (No, Yes) to (Yes, No) respectively.
Lastly columns names are renamed to lowercases
#column names are changed to lowercase
new_column_names = {'PatientId': 'patient_id', 'AppointmentID': 'appointment_id',
'Gender': 'gender', 'ScheduledDay': 'scheduled_day',
'AppointmentDay': 'appointment_day',
'Scholarship': 'scholarship', 'Hipertension' :'hipertension', 'No-show': 'showed'}
df.rename(columns= new_column_names, inplace=True)
#the values yes,no are changed to no,yes respectively
df['showed'].replace(['Yes', 'No'], ['No','Yes'], inplace=True)
#view first rows to see the changes made
df.head()
#copy the data to df_1
df_1 = df.copy()
df.Alcoholism.value_counts()
The above column shows the number of alcoholic and non-alcoholic patients in the dataset
#for alcoholic patients and showing up
alcoholic = df.query('Alcoholism == 1')
alcoholic_showedup_count = alcoholic.groupby('showed')['patient_id'].count()
alcoholic_showedup_proportion = alcoholic_showedup_count/alcoholic_showedup_count.sum()
print(alcoholic_showedup_proportion)
#the proportion of alcoholic patients
alcoholic_showedup_proportion.plot(kind='bar')
plt.title('Proportion of alcoholic patients')
plt.ylabel('Proportion')
plt.xlabel('Alcoholism')
plt.xticks(rotation=0)
plt.show();
The above query shows that about 79.9% of alcoholic patients show up in their appointment, and only 20.1% doesnt show up in their appointment.
#for non patients and showing up
non_alcoholic = df.query('Alcoholism == 0')
non_alcoholic_showedup_count = non_alcoholic.groupby('showed')['patient_id'].count()
non_alcoholic_showedup_proportion = non_alcoholic_showedup_count/non_alcoholic_showedup_count.sum()
print(non_alcoholic_showedup_proportion)
non_alcoholic_showedup_proportion.plot(kind='bar')
plt.title('Non_Proportion of alcoholic patients')
plt.ylabel('Proportion')
plt.xlabel('Alcoholism')
plt.xticks(rotation=0)
plt.show();
The above query also shows that around 79.8% of non-alcoholic show in their appointment, and only 20.2% doesnt show up
df.gender.value_counts()
The above column shows that Female patient almost doubles the number of male patients in the dataset
# proprtion of each gender for those who showed up
showed_up = df_1.query('showed == "Yes"')
showedup_gender_count = showed_up.groupby('gender')['patient_id'].count()
showedup_gender_prop = showedup_gender_count/showedup_gender_count.sum()
showedup_gender_prop.rename(index = {'F': 'Female', 'M': 'Male'}, inplace=True)
showedup_gender_prop.name = 'Gender'
print(showedup_gender_prop)
# Plot proportion of gender groups in showing up
showedup_gender_prop.plot(kind='pie', figsize=(7,5))
plt.title('Showed proportion of Gender Groups')
plt.xticks(rotation=0)
plt.legend();
print(showedup_gender_prop)
From the above distribution about 64.9% of those who showed up where Females, while 35.1 where Male.
df.groupby('gender')['hipertension'].value_counts()
df.Diabetes.value_counts()
# Calculate show up proportion of patients with health_scholarship
#for alcoholic patients and showing up
Diabetes = df.query('Diabetes == 1')
Diabetes_showedup_count = Diabetes.groupby('showed')['patient_id'].count()
Diabetes_showedup_proportion = Diabetes_showedup_count/Diabetes_showedup_count.sum()
Diabetes_showedup_proportion.name = 'Diabetes & showed up?'
#no_health_scholarship_showedup_proportion.name = 'No health_scholarship & showed up?'
# Plot the proportion
Diabetes_showedup_proportion.plot(kind='bar', figsize=(8, 5))
plt.title('Proportions of patients with diabetes & showed up')
plt.xticks(rotation=0)
plt.show();
print(Diabetes_showedup_proportion)
The above query clearly shows that around 82.0% of patients with diabetes show up, and only 18.00% did not show up.
non_Diabetes = df.query('Diabetes == 0')
non_Diabetes_showedup_count = non_Diabetes.groupby('showed')['patient_id'].count()
non_Diabetes_showedup_proportion = non_Diabetes_showedup_count/non_Diabetes_showedup_count.sum()
non_Diabetes_showedup_proportion.name = 'non_diabetes & showed up?'
#no_health_scholarship_showedup_proportion.name = 'No health_scholarship & showed up?'
# Plot the proportion
non_Diabetes_showedup_proportion.plot(kind='bar', figsize=(8, 5))
plt.title('Proportions of patients with diabetes & showed up')
plt.xticks(rotation=0)
plt.show();
print(non_Diabetes_showedup_proportion)
The above query clearly shows that around 79.6% of patients without diabetes show up in their appointment, and 20.4% did not show up.
Around 79.9% of alcoholic patients showed up at their appointment and on the other hand, around 79.8% of non-alcoholic patients showed up. Hence; non alcoholic patients are not likely the ones that show up at their appointment, which means alcohlism is not to considered to showing up in appointment.
From the above distributions, the average male patients that showed up in their appointment is 35.1%, while on the other hand the average females patients that showed up in thier appointment is 64.9%. Which simply means that female patients have the highest percentage of showing up that the male patients. Hence gender can also be related in showing up.
Around 80.0% of patients with diabetes showed up at their appointment which makes only 18.0% of patients with diabetes that didnt show up. And on the other hand, around 79.6% of patients without diabetes showed up, and only 20.4 of patiets without diabetes didnt show up.
Hence; patients with diabetes are more likely to miss their appointment that those without diabetes.