Today, I had a personal assignment. It involved coding questions related to passengers using Titanic data.
The data was from kaggle.
https://www.kaggle.com/competitions/titanic/data?select=train.csv
Data file
I wrote Python code based on this data. It's still quite lacking, and the way I approach coding is very inefficient and crude. However, the purpose of this study is to practice handling with tables data through Python and to learn coding. Through this practice, I hope to develop a more concise and efficient coding style in the future.
Question 1: Loading the Data
- Question) Load the Titanic data and store it in a variable called df. Then, examine the contents of the data.
import pandas as pd
file_path = "/Users/junhyunlee/Desktop/data/titanic/train.csv"
df = pd.read_csv(file_path)
Question 2: Calculating the Number of Survivors
- Question) Calculate and output the total number of survivors and the number of fatalities on the Titanic.
a = df['Survived']
def cnt_survival(a):
count = 0
for i in a:
if i == 1:
count +=1
return count
print(cnt_survival(a)) #result 342
Problem 3: Calculating the Average Age
- Question) Calculate and output the average age of Titanic passengers.
s = df['Age'].dropna()
def avg_age(s):
total = sum(s)
count = 0
for i in s:
if i>0:
count += 1
if count >0:
return total/count
else:
return 0
print(avg_age(s)) #result = 29.69911764705882
Problem 4: Calculating the Number of Female Survivors
- Question) Calculate and output the number of female survivors among Titanic passengers.
zipped = zip(df['Sex'], df['Survived'])
def cnt(zipped):
count = 0
for sex, survived in zipped:
if sex == 'female' and survived == 1:
count +=1
return count
print(cnt(zipped)) #result = 233
Problem 5: Finding the Passenger with the Most Family Members
- Question) Among passengers with families, find the passenger with the most family members.
df['family'] = df['SibSp'] + df['Parch']
max_f = max(df['family'])
max_family = df[df['family'] == max_f]
print(max_family[['Name', 'family','SibSp', 'Parch']])
#result
Name family SibSp Parch
159 Sage, Master. Thomas Henry 10 8 2
180 Sage, Miss. Constance Gladys 10 8 2
201 Sage, Mr. Frederick 10 8 2
324 Sage, Mr. George John Jr 10 8 2
792 Sage, Miss. Stella Anna 10 8 2
846 Sage, Mr. Douglas Bullen 10 8 2
863 Sage, Miss. Dorothy Edith "Dolly" 10 8 2
Problem 6: Extracting Passengers of a Specific Age Group
- Question) Extract the names of passengers aged 20 or younger to complete a dictionary where the passengers' names are keys and their ages are values.
zipped = dict(zip(df['Name'], df['Age']))
result = {}
for name, age in zipped.items():
if age <= 20:
result[name] = age
print(result)
#result = {'Palsson, Master. Gosta Leonard': 2.0, 'Nasser, Mrs. Nicholas (Adele Achem)': 14.0, 'Sandstrom, Miss. Marguerite Rut': 4.0, 'Saundercock, Mr. William Henry': 20.0, 'Vestrom, Miss. Hulda Amanda Adolfina': 14.0, 'Rice, Master. Eugene': 2.0, 'McGowan, Miss. Anna "Annie"': 15.0, 'Palsson, Miss. Torborg Danira': 8.0, 'Fortune, Mr. Charles Alexander': 19.0, 'Vander Planke, Miss. Augusta Maria': 18.0, 'Nicola-Yarred, Miss. Jamila': 14.0, 'Laroche, Miss. Simonne Marie Anne Andree': 3.0, 'Devaney, Miss. Margaret Delia': 19.0, 'Arnold-Franchi,
Problem 7: Finding the Cabin Class with the Most Passengers
- Question) Find the cabin class that had the most passengers on the Titanic.
a = df['Pclass']
def count(a):
class_1 = 0
class_2 = 0
class_3 = 0
for i in a:
if i == 1:
class_1 += 1
elif i == 2:
class_2 += 1
elif i == 3:
class_3 += 1
return class_1, class_2, class_3
class_1, class_2, class_3 = count(a)
print("Pclass_1:",class_1)
print("Pclass_2:",class_2)
print("Pclass_3:",class_3)
#result
Pclass_1: 216
Pclass_2: 184
Pclass_3: 491
Problem 8: Printing Information of the Passenger with the Highest Fare
- Question) Find the passenger on the Titanic who paid the highest fare.
a = df['Fare']
max_fare = max(a)
s = df[df['Fare'] == max_fare]
print(s[['Name', 'Fare']])
#result
Name Fare
258 Ward, Miss. Anna 512.3292
679 Cardeza, Mr. Thomas Drake Martinez 512.3292
737 Lesurer, Mr. Gustave J 512.3292
Problem 9: Calculating the Survival Rate for Each Gender
- Question) Calculate the survival rate for each gender (male/female) on the Titanic.
sm = df[(df['Sex'] == 'male') & (df['Survived'] == 1)]['Sex'].count()
tm = df[df['Sex'] == 'male']['Sex'].count()
sfm = df[(df['Sex'] == 'female') & (df['Survived'] == 1)]['Sex'].count()
tfm = df[df['Sex'] == 'female']['Sex'].count()
print(f'남자 생존률 :{sm/tm}')
print(f'여자 생존률 :{sfm/tfm}')
#result
남자 생존률 :0.18890814558058924
여자 생존률 :0.7420382165605095
Problem 10: Finding the Most Common Departure Port
- Question) Find the port from which the most passengers departed, and output the number of passengers who departed from that port.
s = df['Embarked'].dropna()
def count_emb(s):
cnt_s = 0
cnt_c = 0
cnt_q = 0
for i in s:
if i == 'S':
cnt_s += 1
elif i == 'C':
cnt_c += 1
elif i == 'Q':
cnt_q += 1
max_cnt = max(cnt_s, cnt_c, cnt_q)
if max_cnt == cnt_s:
return cnt_s, 'S'
elif max_cnt == cnt_c:
return cnt_c, 'C'
else:
return cnt_q, 'Q'
max_value, max_name = count_emb(s)
print(max_name, max_value)
#result = S 644
I spent the whole day solvin on these questions. I still don't know whether my answers are correct or not.
I plan to compare them with the correct answers this Friday to explore areas for improvement.
Through solving these questions, I finally got to use Python to manipulate data directly.
I realized that SQL is more intuitive and easier to handle than Python. Although I'm still at a basic level, I've realized many areas where I'm lacking. While solving today's questions, I learned about various features of Python, and I think I need to use Python more extensively to become more familiar with Python code in the future.
'Study Note > Python' 카테고리의 다른 글
Enumerate() (0) | 2024.05.08 |
---|---|
Missing data handling (0) | 2024.05.02 |
Python Data Types (0) | 2024.04.25 |
Lambda function (0) | 2024.04.05 |
10. Purpose of 'while' and 'for in' (0) | 2024.04.02 |