Intro to Python for Data Science

Earlier this year, I had the opportunity to give a 30 minute water cooler talk to the NASA Datanauts cohort. I spoke on an Introduction to Python for Data Science.

This is the link to the github repository.

I have a couple slides that are introduction to Python and some object-oriented concepts, but that’s not why I’m writing this post. Instead, I’m going to talk about the approach I took when I was thinking about this tutorial.

It seems like anywhere I look, I can find good tutorials on how to perform certain manipulations in Pandas, or do some plotting in Matplotlib, and those tutorials are often written and presented by people who are much more familiar than I am with the technology at hand.

Instead, I was interested in presenting the things I didn’t find easily on the Internet.

How do people debug?

How should I think through a problem?

I wanted people to come away with a sense of what to do when they got stuck, and how to interrogate problems. To provide that toolkit, then, I introduced the tools, then walked through as much of a “real-world” data science problem as possible.

The toolkit I described included:

dir()

dir(obj) allows one to interrogate the properties and methods of an object.

inspect
The inspect module is literally magic. I first encountered it at a coderetreat, where it was used in a (quite antagonistic) pair programming session. Since then, I’ve worked on a lot of projects where a quick inspect.getsourcelines(obj) lets me see what’s happening under the hood, and it’s much faster than finding the documentation on Google.

Last but not least, reading the source code, of course! One of my recent hobbies has been to go to Github to read the python source code of whatever module I’m having trouble with. Often, by understanding what inheritance patterns look like and reading class definitions, I’m able to get a better sense of the logic behind the codebase and choices that were made.

Since I gave the talk, I’ve added a couple more things to my toolkit – but that’s fodder for another blog post.

PyCon 2017 Postmortem

The first half of this year has been full of travel for me – I’m hoping to post a data visualization of that at some point this year. In the meantime- I’d like to write a bit about PyCon.

PyCon (US) this year was in Portland, Oregon, home of Powell’s Bookstore, Voodoo Donuts, Salt & Straw Ice Cream, and Blue Star donuts. Of these, I would say that Salt & Straw and Powell’s were my favorite.

bookstore photo
Powell’s Bookstore

While my colleague and I were at the coffeshop inside Powell’s bookstore, we decided to take some time to catch up on work. I didn’t intend to break Python3 on my personal computer the day before tutorials started, but that’s basically how it went.

Kelsey Hightower’s brilliant closing keynote began with a recap of some of the rules about Python. One of them:

The first rule of Python: never mess with system Python

Of course, I had already broken that rule (rendering the GUI on my personal laptop unusable) the night before PyCon started.

Okay- now that embarrassing stories are over with, let’s get to the fun stuff. My main takeaways from the tutorials I went to were:

  1. There are matplotlib wizards in the world, and I would like to someday be one.
  2. I need to find an excuse to use Bayesian Machine Learning at work.
  3. I have a couple new reference points for how to test my code and my data.

I have to say that the testing BoF (Birds-of-a-feather) gathering was one of my favorite events. I enjoyed the conversation and the lightning talks, and it was a good way to unwind from a day of hard thinking.

Of the talks I went to, I probably most appreciated the ones that looked through Python C source code: several talks about the internals of Python’s GIL, and one on dictionaries. Other talks I went to looked through bytecode during the presentation. Both of these exercises I found incredible helpful, if only to provide a couple more tools for me to problem-solve and learn in Python.

If I’m still doing math and programming stuff for my day job in a year, I 100% would go to PyCon again.