Talk dirty data to me: Google Data Analyst Course 4 recap
I know this headline is corny but I couldn’t resist. Loving SQL and spreadsheets was not on my bingo card for 2022, but here we are! My search history is now interspersed with things like:
- Correct syntax for calculating…
- Dollar sign row column in spreadsheets meaning
- How to … in excel
Three weeks ago, I completed the fourth course in the Google Data Analytics Professional certificate: Process Data From Dirty To Clean taught by Sally Kim.
I learned a lot about the data cleaning process, calculating margin of error and sample sizes, documenting results and modifications in a changelog, etc.
I practiced uploading datasets to BigQuery, creating tables from them, and using the Select-From-Where SQL query. With Spreadsheets, I practiced making pivot tables, using conditional formatting, VLOOKUP, and more.
It was fascinating! But beyond all that, I took home that being curious, observant, and understanding the business objective is valuable.
I’m proud of myself for:
- Completing this course.
- Figuring out the Excel equivalent even though the course focuses on Google Sheets.
- The patience and attention to detail in troubleshooting problems I encountered.
- Simple intuition to put 2 and 2 together. Like going ahead of the instructor to use the VLOOKUP function to find a value match after understanding how the Pivot table works.
- Coming up with a simple cleaning routine by inspecting the dataset column by column and determining what the dirty data might look like based on the type and questions to be answered. I practiced this with a small sample and I’ve never been so pleased to find errors. I can’t wait to improve and create an efficient system for larger datasets.
Right now though, I’m happy with my progress. Because 4 months ago, my notes would have looked like gibberish. At the time, I told myself to not lose heart when seeing terms I didn’t understand because everyone starts with zero knowledge. And well, I didn’t lie.
Post a comment