Cell C Q1: What is the size of the vocabulary? Cell F Q2: What do you expect the following probabilities to be? Why? Cell H Q3: Run this code to produce a plot. What does this plot show? What is on the x-axis, what is on the y-axis? Cell H Q4: According to this plot, is the ngram assumption justified? If you don't remember what the ngram assumption is, ask Nils. Cell I Q5: Why does this throw an error? Cell K Q6: Why doesn't the dev perplexity keep decreasing as n increases? Cell L Q7: What are the best values for n and smoothing? Cell L Q8: How many parameters does the best ngram model have? Explain how you compute this quantity. Cell M Q9: What do you notice? Cell N Q10: How does this RNN LM deal with words outside of its vocabulary? Cell O Q11: How do the number of parameters of this model compare to the number of parameters of the best ngram model? Cell P Q12: How does the following compare to the best ngram model? Cell Q Q13: What else could we have done to further push the dev loss down? Cell S Q14: How does the following compare to the other models? Cell T Q15: What do you notice? Is there a phrase that describes this mismatch between distributions?