Programming Analytics

A feed of interesting tidbits from IT, software engineering, business intelligence, and videogaming.

A far better way of writing “string-to-type” conversion is here:

http://www.hanselman.com/blog/TypeConvertersTheresNotEnoughTypeDescripterGetConverterInTheWorld.aspx

Here’s the terrible thing about C# - it changes and grows so fast there’s tons of fantastic stuff in it that often slips below the radar. This is by far the best little type conversion kit I’ve seen!

I’ve started an open source project to create an Automated SSIS Decompiler. This program is so far capable of reading in a DTSX package and writing, as output, a C# source code file and an accompanying configuration file. I’ll continue to develop this program as long as it helps me research my current administration / maintenance process; anyone who is interested in providing assistance will be welcome to join in!

Read this article today while researching ways to migrate an SSIS package into a C# program. It would really be neat if there was a way to programmatically convert an SSIS abomination into a readable, intelligible C# / .NET program. Maybe that will be a useful project ;)

http://ayende.com/blog/2637/ssis-15-faults

So here’s the thing: why is SSIS bad? Honestly, when I used to use Data Transformation Services back in the late ’90s, there was nothing wrong with it. DTS was a simple way to automate a complex, bulk data transformation process. It was limited, but that was okay - because it had a specific function. It imported and exported data to and from SQL server.

Nowadays SSIS supports everything and the kitchen sink. What used to be straightforward (Import -> Select source -> Select columns -> Go!) is now a massive undertaking. I thought it was really neat how DTS had the ability to save a package so you could repeat it again in the future. If I’d have known that it would become this complex and impenetrable, I would have walked away.

I’ve been teaching people how to use basic regression for a while now, and I finally wrote it all up as an article here:

http://www.altdevblogaday.com/2012/08/10/business-analytics-with-regression/

There are lots of tools available that are far more complex than you’d ever need, but fortunately there are open source tools that can really help you get started smoothly. Good luck!

So I got tired of writing and rewriting the same command line user interface over and over again.

I have lots of programs that I write to be “automatable” - i.e. so that an operations team member can put it into a script and execute it, or schedule it, or batch it up.

Rather than continue rewriting this stuff over and over, I wrote a library that accomplishes universal execution automation via reflection:

http://code.google.com/p/csharp-command-line-wrapper/

You can view the code here, and drop it into your project at any time by copying and pasting this code:

http://code.google.com/p/csharp-command-line-wrapper/source/browse/trunk/CommandWrapper/CommandWrapLib/CommandWrapLib.cs

It produces nice clean output and parses your variables on the command line.

This is truly the coolest photo I’ve seen in a while. Nice camera work. ;)

http://www.nasa.gov/mission_pages/msl/news/msl20120806b.html

This looks surprisingly useful! Why not have an Amazon-style scripted toolset for deploying images to your internal cloud?

http://www.codeproject.com/Articles/31961/Automating-VMWare-Tasks-in-C-with-the-VIX-API

This would have been fantastic back when we were deploying virtual GamePulse images every week.

Ever wonder how the Mac was possible? Now you can actually download the 68000 assembly source code for Bill Atkinson’s QuickDraw, one of the most impressive toolkits you never really thought about:

http://www.computerhistory.org/highlights/macpaint/

Joel Spolsky wrote a cool article on the problems with letting your programming backlog get too big.

http://www.joelonsoftware.com/items/2012/07/09.html

Over the past five years I suffered heavily from this. There were always hundreds of bugs being reported; there we always years worth of backlog programming ideas in the queue. At the time, I felt it was useful to have a massive backlog of ideas.

But of course, at the same time, we kept our programming velocity up: We had a daily scrum process and we identified the most promising changes that could get into our codebase right now. Because of that, we were able to deliver incredibly rapid change and keep our product at the top of the market even when we had fewer programmers than any of our competitors.

I like to think of it as “banking” your changes. Any code you write is ephemeral until it is launched. As you work, you build up lots of potential improvements. But if you don’t deploy your code - real artists ship - your changes never benefit anyone. We kept up our velocity by shipping all changesets that were ready, deploying code to live about once every three weeks.

I did a lot of work in sales data processing. In America, sales data is reported monthly; and months are handled in one of two ways: either as a calendar month or as a NRF 4-5-4 calendar month, which is used to make year-over-year month comparisons consistent. Basically, the NRF calendar allows you to compare January 2011 to January 2012 in such a way that each month contains the same number of weekdays and weekend days.

When I started working with European data, I was overjoyed to discover that European data was reported weekly. Even better, all the weeks appeared to start on Sunday. I could simply load data from all the European countries, who provided for each data point the year and week numbers. I wrote a tiny function in my code that took these two numbers and produced a date, and a separate function that did the opposite. Here’s what they looked like:

-- Determine the week number of this date
SELECT DATEPART(week, GETDATE())

-- Starting from January 1st, backtrack to Sunday; then add 29 weeks 
-- to get the start date for week 30
SELECT DATEADD(week, 29, 
    DATEADD(day, 
    1-(SELECT DATEPART(weekday, '2012-01-01')), '2012-01-01'))

This worked great until I started receiving data from provider A which said “Week 53 of 2008” whereas the same data from provider B said “Week 1 of 2009”. All my carefully constructed logic blew up. Because these providers would not explain what algorithm they used to calculate their year-week numbers, I had to infer their logic from how their calendars changed. I wrote this post to hopefully explain to you the challenge involved in using year-week numbers; I’m sorry it’s not more helpful.

Here are some of the definitions I have seen used:

  • Weeks start Sunday OR weeks start Monday.
  • The first week that contains January 1st of year X is considered week 1 of year X OR the first week whose first day starts after January 1st of year X is considered week 1 of year X.
  • In some cases, I’ve worked with some data companies that worked on a “balance of days” principle, where the first week that has three or more days within year X is considered week 1 of year X.

With this in mind, you can determine year-week numbers in any way you want. Just remember that the numbers are not standardized unless you publish your definition of how you calculate year week numbers.