Day 8 & 9: End of the Orientation, Start of the Mundane
If I were to break down the experience into different stretches, I would categorize the last week and a half to be the orientation process and yesterday to be the start of the more mundane part of the internship. I feel as though I finally have a grasp of what we are doing and feel a sense of comfort with my duties as a research and open-source developer student. Previously, I was tasked with finding the dataset for Chapter 2 of the How to Think Like a Data Scientist textbook. My goal was to analyze different opensource databases and find a dataset that would resemble the original chapter dataset, World Happiness Rankings. The new dataset needed to be more relevant to Business students as the class we are working on is a dual perspective between Business and Computer Science departments. I analyzed many databases, including Kaggle, data.gov, Google Public Data, Awesome Public Datasets, opendata.aws, datacommons.org, and Public Data of the City of New York. After spending some time, I noticed that most of these databases provided large datasets on topics like environment, space exploration, and urban planning. While this data could be useful for Business students or anyone for that matter, we needed a more concise dataset that resembled World Happiness Rankings that was directly related to business. I felt excited about working with different datasets, but also slightly discouraged as I could not find the right database for the assignment.
Throughout the process, I realized that World Happiness Rankings is not like the datasets I have been reviewing. Most of the opensource databases provide large datasets and require further processing and analysis, while the WHR is already processed and analyzed. After some time I found a dataset on website rankings across the globe, which needed some cleaning, but could be used for the chapter.
Today, after I introduced the dataset to the team, Imma and Sandesh spent some time thinking of different applications. We then reviewed the dataset against the chapter activities and realized that the dataset is corrupt and does not properly represent results for a particular website in each country. We decided to scrap all of the previous work on the dataset and find a new one. Sandesh found a new database, DoingBusiness.com, which represents different economic data for countries around the world. This opensource database is a subchapter of the World Bank Organization. I found a dataset on Index for Starting a Business in different countries. The dataset works perfectly for our needs as it introduces the idea of Business from a more general focal point and well reflects the World Happiness Rankings. I then added some variables to the dataset for the students to practice finding potential correlation/causation. After uploading the dataset to Google Drive, we all attended the Runestone Interactive Workshop on creating Assignments. The day started slow, however, we achieved some great results towards the end. I did not want to be the one announcing that all of the work we had done needs to be scrapped, but I did not want to be creating a low-quality product even more.
Comments
Post a Comment