Design of Experiments 101: Cross Validation

What is an experiment?

An experiment is a procedure that you perform in order to validate (or to reject) your hypothesis.

Your hypothesis might be that the selection strategy, the classifier (regressor), or a smart combination of those that you developed performs better than others. Or maybe you just want to let your approaches in the wild (on your data) and assess the results.

For the sake of simplicity, let’s assume that you have a paradigm H (your hypothesis), a data set X, and a performance measure E (this is how you assess the performance of your approach numerically; e.g. classification accuracy).

The following approach works for supervised learning too, not just for active learning.

A simple example

The main idea behind design of experiments is:

the design of the experiment is similar to a contest.

The Contest: Alice has a dataset consisting of 100 data points and wants to know if Bob or Carl is the better data scientist. So, she gives Bob and Carl 75 data points and asks each of them to provide the best model they can achieve. After that Alice will compare both models on the 25 data points, she held back.

The Optimization: Now, both data scientists try to find the best parameters for their model. They also split the data: 60 for training and 15 for validation. After training several models with different parameters on the 60 data points, each of them chooses the model which performed best on the remaining 15 data points.

The Comparison: Finally, Alice will evaluate the final models of both data scientists on her held out data. Bob wins if his model performs best and Carl respectively.

Our terminology

In the following, we use these terms to describe the different kinds of subsets (see also wikipedia):

  • Outer training set: the data Bob and Carl are given by Alice to find their best approach (75 data points)
  • Outer test set (often: test or evaluation set): the data Alice held back to test Bob’s and Carl’s approach (25 data points)
  • Inner training set (often: training set): the data Bob and Carl used to train a model with specific parameters of their approach (60 data points)
  • Inner test set (often: validation set): the data Bob and Carl used to determine the best parameter set (15 data points)

How can Bob and Carl do better (improve the generalization of their training procedure)?

So far, both data scientist just had one fixed training set (inner training set) and one validation set (inner test set). By random it could happen that one test set is particularly difficult for parameter setting and easy for another. Hence, we should ensure that every instance has been used for testing.

In k-fold cross validation, the data given by Alice (75 data points) is split in \(k=5\) folds. Hence, they have 5 subsets with 15 instances each. To predict the labels of the first fold, the data from folds 2, 3, 4, 5 is used for training. For the second fold, the algorithm is trained on folds 1, 3, 4, 5, etc. This methodology is much more robust and therefore leads to better results. Hence, it is more probable that the parameter setting which performed best actually is the best for the given data.

But now, one problem occurs. For the best parameter setting, each data scientist has 5 different model because of the k-fold cross validation. As Carl did not know what to do, he chose one by random. Bob had a better idea: He used the parameter setting, he found out was best, and trained the model on all data that he was given.

How can Alice do better?

Alice is faced with a similar situation as Bob and Carl. Maybe, someone just got lucky or the selection of training resp. test instances has been better for one of the competitors. Hence, Alice also performs k-fold cross validation (here \(k=4\)). Hence, Bob and Carl are asked to provide 4 different models and Alice checks if the results are consistent.

To be even more certain, she calculates only one performance value for one k-fold cross validation. Then she repeats the selection of instances multiple times to be certain that the results are not random.

Summary: How do you split your data?

The main idea of cross validation is to prevent that the model had seen the test data during training. This means that test data has neither been used for training or tuning. If we want to rank different algorithms with their best parameter setting, we need the two-staged cross validation. Hence, algorithms selection is the outer cross validation and on each training set, we perform a separate inner cross validation. More details can be found in the wikipedia pages mentioned above.

If you are interested how to evaluate active learning algorithms, please see the paper:
Challenges of Reliable, Realistic and Comparable Active Learning Evaluation by Kottke, Calma et al.

34 Replies to “Design of Experiments 101: Cross Validation”

  1. Every weekend i used to visit this web page, for the reason that i wish
    for enjoyment, as this this website conations in fact
    good funny information too.

  2. I love it when folks come together and share views. Great blog, keep it up!

  3. Hello there, just became aware of your blog through Google,
    and found that it is truly informative. I am gonna
    watch out for brussels. I’ll appreciate if you continue this in future.
    Numerous people will be benefited from your writing.

  4. Excellent post. I definitely love this website. Continue the good work!

    My page :: CBD for dogs

  5. I visited several sites but the audio feature for audio songs present at this web
    page is truly wonderful.

  6. I am actually happy to glance at this weblog posts which includes plenty of helpful data, thanks for providing these

  7. Thanks , I’ve just been searching for information approximately this
    subject for a while and yours is the greatest I have found
    out so far. But, what in regards to the conclusion? Are you certain about the source?

  8. Hi there, everything is going fine here and ofcourse every one is sharing facts,
    that’s truly fine, keep up writing.

  9. Oh my goodness! Impressive article dude! Thank you so much,
    However I am going through problems with your RSS.
    I don’t know why I am unable to subscribe to it.
    Is there anybody getting the same RSS issues? Anyone that
    knows the answer can you kindly respond? Thanx!!

    Here is my web-site; CBD gummies for pain

  10. Incredible story there. What occurred after? Take care!

    My web page; CBD gummies for sale

  11. Hello to all, how is all, I think every one is getting more
    from this web site, and your views are pleasant in support of new viewers.

    Review my page; best delta 8 thc

  12. I think the admin of this web page is truly working hard in support of his web site, for the reason that here every material is
    quality based data.

    Here is my web-site; buy delta 8 thc near me

  13. I’m impressed, I must say. Seldom do I encounter a blog that’s both equally educative
    and entertaining, and let me tell you, you’ve hit the nail on the head.
    The problem is something which too few people are speaking intelligently about.

    Now i’m very happy I stumbled across this in my hunt for
    something concerning this.

    Here is my web-site: delta 8 thc

  14. Spot on with this write-up, I seriously believe that
    this website needs a great deal more attention. I’ll probably be returning to read through
    more, thanks for the information!

    Feel free to surf to my web blog … CBD gummies for pain

  15. Hello my loved one! I want to say that this article is awesome, nice written and include approximately all important infos.
    I would like to peer more posts like this .

    My website … buy delta 8 thc online

  16. Hurrah, that’s what I was seeking for, what a stuff!
    present here at this website, thanks admin of this web site.

    my site … delta 8 thc products

  17. I was wondering if you ever thought of changing the layout of your site?
    Its very well written; I love what youve got to say.
    But maybe you could a little more in the way
    of content so people could connect with it better. Youve got an awful lot of text for only having one or two
    pictures. Maybe you could space it out better?

    Also visit my web blog … buy delta 8 thc online

  18. Hello Dear, are you genuinely visiting this site regularly, if so then you will without doubt
    get pleasant knowledge.

    Here is my web page: best CBD

  19. I am not sure where you’re getting your info, but great
    topic. I needs to spend some time learning more or understanding more.
    Thanks for fantastic information I was looking for this
    information for my mission.

    my web blog: cbd gummies

  20. I’m really loving the theme/design of your site.
    Do you ever run into any web browser compatibility issues? A few of my blog readers
    have complained about my site not operating correctly in Explorer but looks great in Chrome.
    Do you have any ideas to help fix this problem?

    Here is my webpage :: where to buy CBD

  21. Pretty nice post. I just stumbled upon your blog
    and wished to say that I have truly enjoyed browsing your blog posts.
    After all I will be subscribing to your feed
    and I hope you write again very soon!

    Also visit my web site … best CBD gummies

  22. Hi, yup this post is in fact fastidious and I have learned lot
    of things from it concerning blogging. thanks.

    Here is my web page – cbd

  23. You really make it seem so easy with your presentation but I
    find this topic to be really something that I think I would never understand.
    It seems too complicated and very broad for me.
    I am looking forward for your next post, I’ll try to get the hang
    of it!

    Feel free to surf to my web site buy cbd

  24. These are truly enormous ideas in concerning blogging.
    You have touched some good things here. Any way keep up wrinting.

    my web page … delta 8

  25. I think the admin of this website is in fact working hard in support of his site,
    for the reason that here every material is quality based information.

    Feel free to surf to my site; CBD gummies for pain

  26. Ahaa, its pleasant discussion about this article here at this weblog, I have read all that, so at this time me
    also commenting at this place.

    Also visit my blog … best delta 8 carts

  27. Thanks for sharing such a good thought, paragraph is nice, thats why i have read it completely

    Stop by my web page area 52 delta 8 THC products

  28. I think everything posted made a lot of sense.
    But, consider this, what if you added a little content? I mean,
    I don’t want to tell you how to run your website,
    but what if you added a headline to maybe get folk’s attention? I
    mean Design of Experiments 101: Cross Validation – Active Machine
    Learning is a little boring. You should glance at Yahoo’s front page and watch how they create article titles to get people interested.
    You might add a video or a picture or two to get readers interested about everything’ve written. In my opinion, it
    could make your posts a little livelier.

    Here is my site; delta 8 THC for sale area 52

  29. Link exchange is nothing else but it is just
    placing the other person’s web site link on your page at appropriate place and other person will also do similar for you.

    my web-site – CBD gummies for sale

  30. Keep on working, great job!

    Also visit my webpage … delta 8 carts

  31. Pretty! This has been an extremely wonderful article.
    Many thanks for providing these details.

    My site – best delta 8 carts

  32. I used to be suggested this website by my cousin. I’m now not certain whether this publish is
    written via him as nobody else realize such distinctive approximately my trouble.
    You are wonderful! Thanks!

    My blog post; cbd for sale

  33. Pretty nice post. I just stumbled upon your blog and wished
    to mention that I have truly loved surfing around your weblog posts.
    After all I’ll be subscribing for your rss feed
    and I hope you write again soon!

  34. Excellent blog here! Also your site quite a bit up very
    fast! What web host are you using? Can I am getting your associate
    hyperlink on your host? I wish my website
    loaded up as quickly as yours lol

    Here is my page: delta 8 thc vape

Leave a Reply

Reload Image