Statistics for Data Science. (T-SQL Tuesday 108)

TSQL2SDAY-300x300The invitation to this blog party is here: https://curiousaboutdata.com/2018/10/29/t-sql-tuesday-108-invitation-non-sql-server-technologies/ and it asks for one thing I want to learn that is not SQL Server.  The TL;DR answer is: statistics for data science.

I started working on this earlier this year.  From June to October I took the “Business Intelligence and Data Analytics” certificate program at the University of Victoria. Each class started with a weekend on campus and was followed by a month of assignments to complete off campus. The three classes were:

  1. Business Intelligence and Data Analytics: Basics
  2. Business Intelligence: Dashboard Design
  3. Data Analytics: Model Design

The coursework was all R. I had not used R previously and gotta say I didn’t love it, but I learned enough to ace the assignments and how to google for what I needed. (Lets face it, that’s half the battle of any new language.) By the end of the course we were scraping data from multiple data sources and formats, manipulating the data into a data analytics project and then building descriptive and diagnostic models on that data.

The biggest challenge I had with the course was with the statistics. I’ve taken some form of Stats 101 a few times, (first when taking commerce at university, and again taking programming at college,) but nothing that would prepare me for data science. I want a much better understanding of the concepts so I could actually DO a data science project.

Last Monday at PASS Summit (http://www.passsummit.com) I took the “Advanced R” pre-con by Dejan Sarka (t), which was a great follow up to the course I’d just finished, but he reinforced what I’d been feeling… I need more stats.

The next few months will be busy so I’ve tried to keep them realistic, but my next steps are to:

  1. In the next couple weeks: Use my commute to listen to an audiobook.  I’ll probably start with “Naked Statistics: Stripping the Dread from the Data“.  If you, dear reader, has a podcast or other audiobook suggestion for me, I’d love to hear it (pun intended).
  2. Following that: Borrow the book: “Practical Statistics for Data Scientists” from work and read up on at least 3 topics (I can’t commit to reading a whole book… I have no time for reading print until next year).
  3. After Christmas: Do a few kaggle competitions and see where else I need to brush up.

PivotChart Formatting

Once upon a time I was investigating whether Excel might be an acceptable tool for interfacing with some of our data.  I checked out PivotCharts and slicers and thought some interesting interactive ‘reports’ could be quickly built with them, so I set out to build a proof-of-concept demonstration.  It was going fine, until I noticed this issue.

Below are a couple screen shots of scrubbed data for illustration.  Can you see the issue?

beforeslicer

Before Date Slicer is applied

afterslicer

After Date Slicer is applied

For the colour-blind or anyone who doesn’t see the issue, I’ll explain.  The colours that represent the organizations changed from one view to the next.  Org B is orange in the first view and blue in the second.  I find that extremely unsettling.  I expect the orange thing in one view to represent the same thing even when the filter has changed!  What’s the point of the speedy application of filters if you have to stop and totally reevaluate the visual each time it renders?

I understand that the formatting (including the colour palette) is applied after the set has been filtered, and this will cause the attribute’s colour to change if its position within the set changes. I would argue that from a business perspective, the format of an attribute should be applied before filtering, so that it is retained from one view to the next.

I did contact some folks at Microsoft in April of 2015 and they assured me it was on a list.  I’m still seeing this behavior today in Excel 2016 with Office 365 subscription, which is the current version according to: https://products.office.com/en-ca/excel  I await a new version to see if it’ll be resolved.

Back to School

The countdown is on!  I start the first in a set of three university classes this weekend.  Its been a long time since I attended a college or university class.  Fine, I’ll fess up.  Its been over 18 years(!) since I got my “Computer Systems Technology” diploma from college.

I’m not worried about being the oldest kid in the room, I can deal with that.  It’s the class format that has me a touch concerned.  The class is partly “distance education”.  I’m very much in favour of distance learning, but only if its implemented well.

When I was on leave after the birth of my second child I started a distance program in Technology Management at a major technical school.  I started out quite excited about it.  It was 2010 and all the necessary technology existed.  I did the reading and followed the instructions and… wait!  Not so fast!  I couldn’t follow the instructions.  They all referred to something that didn’t exist, such as a case study on page 34 or questions on page 52, where there were no case studies or questions.  I didn’t get concerned right away, I thought I’d just alert the instructor and they’d fix it right up.  Except I got no response.  I submitted more questions that were also ignored.  When I brought it up with staff from the department I got the distinct impression there WAS NO INSTRUCTOR.  Someone must’ve been assigned the task of marking the assignments, but all the answers I got regarding the instructions were guesses.  They didn’t know the course or the subject matter at all.

That was the end of it for me.  I’m sure I could have gotten through the course, but why would I bother?  There’s no way I was going to put up with that for an entire program.  I wasn’t going to spend my time and money trying to get a degree from an institution that wouldn’t communicate with me.  I was sorely disappointed.

The part that surprised me the most was that nobody from the program ever followed up with me to see why I’d dropped the course.  Surely attrition is a problem for distance ed programs and a little customer service would go a long way?

I’m optimistic that this won’t be a repeat.  Each class starts with a weekend of in-person class time followed by 4 weeks of distance ed.  The instructor has been in touch already and I have instructions to install R and RStudio.  I’m looking forward to learning and practicing some Business Intelligence, Data Analytics and R.  I’ve some experience with BI, but always as a developer and I want more theory so I can contribute more to the design.

The Good, the Bad, and the Ugly

This is about my first SQL Saturday speaking experience.

I’ve seen a lot of great sessions by PASS community members over the last 5 years, and thought I’d like to give it a try.  Local events here on my island are few and far between, so its been easy to not do anything about it.  But since I’ve taken on the lead of the local SQL Server user group I suddenly have an opportunity to give presentations locally.  One thing leads to another and I submitted a session to SQL Saturday Edmonton and was selected as a speaker.

The Good:

The process of putting together the presentation was interesting.  Taking a topic and breaking it down into a logical progression of theory and instruction took longer than I thought it would.  I learned that indeed, the best way to learn a thing is to prepare to teach it!  Even though I’ve had lots of hands-on experience with the topic I presented, I was still forced to learn the parts I’d never needed to use, and its history, and to look for alternatives, etc.

I gave the talk locally before heading to SQL Saturday.  I had been nervous about getting questions, but turns out I don’t mind getting questions because I got lots of them both times and I think I handled them well.  The questions were generally of the “I’m having a light bulb moment and need to ask you if this tool will help me with my issue” type rather than the “you’re not explaining this well and I have to ask for clarification” type, which gave me a sense that the asker was really getting something out of the presentation.

The Bad:

I figured I’d be nervous during the presentation and prone to forget some important points, so I loaded my PowerPoint with notes for my own reference.  This worked well enough when I presented the session locally at the user group, but at SQL Saturday the connection was such that the audience saw what was on my screen, so no presenter view for me! No notes or timer!  Big oops.  I got off to a rocky start and I know I missed some points.  Maybe a lot of them.

Lesson learned: if I need notes, have them on paper, too.  Also have some other timer handy.

The Beautiful:

Fooled you, there’s no ugly.  I’m glad I did it.  I plan to do it again.  I asked for session evaluations and got 6.  I’m well aware that it could have gone better (see The Bad) but the evaluations were positive.  Between the evaluations and the questions, I’m confident that at least some people got something out of the session.  That’s enough for me to keep at it.

Hello Microsoft Flow

I’ve been reading about Microsoft Flow with interest, because I think that it may eventually solve a business problem for my company.  I don’t see the connectors I need yet, but I keep checking back and wishing I had more opportunity to try it out.

I just set up a new twitter account: @PASS_ProfDev for the Professional Development virtual chapter of PASS and thought I would try a simple Flow.  I wanted tweets from my personal twitter account to also post to the new account if they contain a certain hashtag.  The chapter has an o365 account with access to Flow, so I logged in and got started.

From Flow -> My Flows -> Create from Blank: it helpfully suggested the popular “When a new tweet is posted” trigger, which I selected.  Then I provided the login for my new twitter account. Next up was filling in the search criteria for the tweets I want to post from the new account.  I used Advanced Twitter Search to get the search text.  Then I told Flow what Action to do when it got a match, which was to “Post a tweet”, with the “Tweet text” set to “Tweet text”.

flow

I chose to just tweet the same thing, but I could have added other text in there, for example mentioning the poster’s name, which I may have done if I were reposting all posts with the tag, instead of just my own.  Or I could have added in more conditions, such as checking that the poster has a minimum number of followers.  There are LOTS of triggers, conditions and actions that I could have used, but really this was meant to be the “hello world” of Flow.  There’s more information about the Twitter connection here.

Next I want to try something with Teams and Planner.  And I continue waiting for connectors to Word.

Mt. Prevost

Aside

I did a quick search for an image from MY island instead of just a generic island picture for the header and this picture caught my eye.  Not only is it pretty, but those mountains are what I see from my house.  Vancouver Island is pretty nice.

Giving Back (T-SQL Tuesday #102)

TSQL2SDAY-300x300This seems like as good a time to start a blog as any, as I feel like 2018 is the year I start giving back to this community. I’ve been a member of PASS since the first year I went to Summit in 2013 and I’ve learned a lot from the members of PASS. The pathways to learning have been varied, but I have really benefited from presentations, blog posts, hallway track, and even twitter. I wanted to give back, but was not prepared to take on more commitments until recently. My kids are getting older and lately don’t need their mommy quite like they used to.

At last Summit I committed to organizing a SQL Saturday for my area for early March. When I was preparing for it I realized I was too out of the loop with regard to my local tech scene. Over the last couple years the local user group had dried up and I wasn’t getting out and meeting people. So I became the leader of the local defunct user group and as soon as SQL Saturday was over I secured a venue and started setting up meetings for the local user group.

Our first meeting was a meet and greet to gauge interest. A dozen people came and they were enthusiastic and contributed ideas. The next month I presented a session and again the attendees were engaged. I have speakers lined up until summer break and plan to have at least 3 meetings in the fall.

Did you notice in the previous paragraph that *I* presented a session? You may not be aware that that was a significant statement. I had not done that before. The even more significant part is that I did it AGAIN last weekend at SQL Saturday in Edmonton. I plan to work at giving more presentations, by developing another session to present locally in the fall and by submitting to more events.

I’m also an organizer for the Professional Development virtual chapter. We just hosted a presentation and have two more in the pipe.  You can expect life from that group again.