6.891 HW #2 Clarifications

The indications of explanation lengths (e.g. one sentence, two sentence suffices) are guidelines and are not meant to be strict upper bounds. You may write more if you feel that your one or two sentences may not answer the question adequately.
    1. The link between the delta rule for homework 2, question 1a and homework 1, question 2c is most easily seen if the labels (y values) are thought of as vectors rather than integer values. i.e. instead of thinking of the labels being {1,2,...,k}, think of them as {[1 0 ... 0], [0 1 0 ... 0], ... [0 ... 0 1]}. It might be instructive to first try reformulating 2c on hw1 as a problem where the labels are vectors {[1 0], [0 1]} rather than {0,1}.

      Equation (5) should have a sum between the epsilon and the partial with respect to w_{ij} (similar to what we had in equation (5) on the last homework).

    1. The topk.m code originally distributed with the homework was slightly buggy. Get the updated version here.
    2. The wrapper method described in recitation is not identical to the wrapper method used for this problem. Wrapper feature selection methods are a class of feature selection methods which directly involve the classification model.