Statistics for Data Science. (T-SQL Tuesday 108)

TSQL2SDAY-300x300The invitation to this blog party is here: and it asks for one thing I want to learn that is not SQL Server.  The TL;DR answer is: statistics for data science.

I started working on this earlier this year.  From June to October I took the “Business Intelligence and Data Analytics” certificate program at the University of Victoria. Each class started with a weekend on campus and was followed by a month of assignments to complete off campus. The three classes were:

  1. Business Intelligence and Data Analytics: Basics
  2. Business Intelligence: Dashboard Design
  3. Data Analytics: Model Design

The coursework was all R. I had not used R previously and gotta say I didn’t love it, but I learned enough to ace the assignments and how to google for what I needed. (Lets face it, that’s half the battle of any new language.) By the end of the course we were scraping data from multiple data sources and formats, manipulating the data into a data analytics project and then building descriptive and diagnostic models on that data.

The biggest challenge I had with the course was with the statistics. I’ve taken some form of Stats 101 a few times, (first when taking commerce at university, and again taking programming at college,) but nothing that would prepare me for data science. I want a much better understanding of the concepts so I could actually DO a data science project.

Last Monday at PASS Summit ( I took the “Advanced R” pre-con by Dejan Sarka (t), which was a great follow up to the course I’d just finished, but he reinforced what I’d been feeling… I need more stats.

The next few months will be busy so I’ve tried to keep them realistic, but my next steps are to:

  1. In the next couple weeks: Use my commute to listen to an audiobook.  I’ll probably start with “Naked Statistics: Stripping the Dread from the Data“.  If you, dear reader, has a podcast or other audiobook suggestion for me, I’d love to hear it (pun intended).
  2. Following that: Borrow the book: “Practical Statistics for Data Scientists” from work and read up on at least 3 topics (I can’t commit to reading a whole book… I have no time for reading print until next year).
  3. After Christmas: Do a few kaggle competitions and see where else I need to brush up.

Back to School

The countdown is on!  I start the first in a set of three university classes this weekend.  Its been a long time since I attended a college or university class.  Fine, I’ll fess up.  Its been over 18 years(!) since I got my “Computer Systems Technology” diploma from college.

I’m not worried about being the oldest kid in the room, I can deal with that.  It’s the class format that has me a touch concerned.  The class is partly “distance education”.  I’m very much in favour of distance learning, but only if its implemented well.

When I was on leave after the birth of my second child I started a distance program in Technology Management at a major technical school.  I started out quite excited about it.  It was 2010 and all the necessary technology existed.  I did the reading and followed the instructions and… wait!  Not so fast!  I couldn’t follow the instructions.  They all referred to something that didn’t exist, such as a case study on page 34 or questions on page 52, where there were no case studies or questions.  I didn’t get concerned right away, I thought I’d just alert the instructor and they’d fix it right up.  Except I got no response.  I submitted more questions that were also ignored.  When I brought it up with staff from the department I got the distinct impression there WAS NO INSTRUCTOR.  Someone must’ve been assigned the task of marking the assignments, but all the answers I got regarding the instructions were guesses.  They didn’t know the course or the subject matter at all.

That was the end of it for me.  I’m sure I could have gotten through the course, but why would I bother?  There’s no way I was going to put up with that for an entire program.  I wasn’t going to spend my time and money trying to get a degree from an institution that wouldn’t communicate with me.  I was sorely disappointed.

The part that surprised me the most was that nobody from the program ever followed up with me to see why I’d dropped the course.  Surely attrition is a problem for distance ed programs and a little customer service would go a long way?

I’m optimistic that this won’t be a repeat.  Each class starts with a weekend of in-person class time followed by 4 weeks of distance ed.  The instructor has been in touch already and I have instructions to install R and RStudio.  I’m looking forward to learning and practicing some Business Intelligence, Data Analytics and R.  I’ve some experience with BI, but always as a developer and I want more theory so I can contribute more to the design.