Earlier this year, I had the opportunity to give a 30 minute water cooler talk to the NASA Datanauts cohort. I spoke on an Introduction to Python for Data Science.
I have a couple slides that are introduction to Python and some object-oriented concepts, but that’s not why I’m writing this post. Instead, I’m going to talk about the approach I took when I was thinking about this tutorial.
It seems like anywhere I look, I can find good tutorials on how to perform certain manipulations in Pandas, or do some plotting in Matplotlib, and those tutorials are often written and presented by people who are much more familiar than I am with the technology at hand.
Instead, I was interested in presenting the things I didn’t find easily on the Internet.
How do people debug?
How should I think through a problem?
I wanted people to come away with a sense of what to do when they got stuck, and how to interrogate problems. To provide that toolkit, then, I introduced the tools, then walked through as much of a “real-world” data science problem as possible.
The toolkit I described included:
dir(obj) allows one to interrogate the properties and methods of an object.
The inspect module is literally magic. I first encountered it at a coderetreat, where it was used in a (quite antagonistic) pair programming session. Since then, I’ve worked on a lot of projects where a quick
inspect.getsourcelines(obj) lets me see what’s happening under the hood, and it’s much faster than finding the documentation on Google.
Last but not least, reading the source code, of course! One of my recent hobbies has been to go to Github to read the python source code of whatever module I’m having trouble with. Often, by understanding what inheritance patterns look like and reading class definitions, I’m able to get a better sense of the logic behind the codebase and choices that were made.
Since I gave the talk, I’ve added a couple more things to my toolkit – but that’s fodder for another blog post.