Digests » 28
‘Cleave, a verb, has two very different meanings. It can describe cutting or splitting something apart with a sharp instrument, or — oddly enough — it can describe sticking to something like glue.’
This is the Episode 1 of the PyderPuffGirls†—a tutorial on automating the boring parts of data analysis that we are going through in the next 8 weeks. I’m writing this tutorial for people that had at least one false start in learning Python, just like me two years ago.
Since this is a very introductory look at model selection we assume the data you’ve acquired has already been cleaned, scrubbed and ready to go. Data cleaning is a whole subject in and of itself and is actually the primary time-sink of any Data Scientist. Go to the end of this article if you want to download the data for yourself and follow along!
Most beginner tensorflow tutorials introduce the reader to the feed_dict method of loading data into your model where data is passed to tensorflow through the tf.Session.run() or tf.Tensor.eval() function calls. There is, however, a much better and almost easier way of doing this. Using the tf.data API you can create high-performance data pipelines in just a few lines of code.
All researchers are familiar with the importance of delivering a paper that is written in a clean and organized way. However, the same thing can often not be said about the way that we organize and maintain the code and data used in the backend (i.e. code and data layer) of a research project. This part of a project is usually not visible and good intentions to keep it organized tend to be one of the first things to fly out the window when a deadline is approaching.