Announcement: Be excellent to each other.


Caravel Forum : Caravel Boards : The Site : Hold rating standard deviation
New Topic New Poll Post Reply
Poster Message
eytanz
Level: Smitemaster
Avatar
Rank Points: 2708
Registered: 02-05-2003
IP: Logged
icon Hold rating standard deviation (0)  
I've started this new thread so I can discuss the merits of the new feature independent of the discussion on rating philosophy.

I think that it's definitely worthwhile.

Looking a the top 3 rated holds:

Magic show: Mean rating 9.1, Stdev 0.77 (12 votes)
Lendy's Dungeon: Mean rating 9.0, Stdev 0.71 (13 votes)
The Fool's Errand: Mean rating 8.9, Stdev 1.95 (35 votes)

Based on this, I can figure out that both Magic Show and Lendy's dungeon were thought to be great-excellent holds by everyone who voted for them. I have also learnt that The Fool's Errand was thought to be just as good overall, but that there was a lot more individual disagreement among the raters.

This reflects useful information about these holds and corresponds to my experience of them - I think everyone should be expected to enjoy Magic Show and Lendy's Dungeon, but that while most people enjoy Fool's Errand, there's a chance you won't enjoy it.

Thus (at least as someone who understands how to read the statistics), the stdev told me something new (or something that would have been new had I not already played these 3 holds) and interesting about 3 otherwise similar-seeming holds (note that Lendy's and Fool's Errand have the same difficulty rating).


____________________________
I got my avatar back! Yay!
09-28-2006 at 11:00 PM
View Profile Send Private Message to User Show all user's posts This architect's holds Quote Reply
agaricus5
Level: Smitemaster
Rank Points: 1838
Registered: 02-04-2003
IP: Logged
icon Re: Hold rating standard deviation (0)  
To add to Eytan's post, I did some statistical analysis (although I'm not entirely sure about the exact accuracy) in this post too.

____________________________
Resident Medic/Mycologist
09-28-2006 at 11:31 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
jbluestein
Level: Smitemaster
Avatar
Rank Points: 1670
Registered: 12-23-2005
IP: Logged
icon Re: Hold rating standard deviation (0)  
Hey, let's make an appropriately small standard deviation a requirement for creating a Smitemaster's Selection hold!



____________________________
"Rings and knots of joy and grief, all interlaced and locking." --William Buck
09-29-2006 at 07:57 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
agaricus5
Level: Smitemaster
Rank Points: 1838
Registered: 02-04-2003
IP: Logged
icon Re: Hold rating standard deviation (0)  
jbluestein wrote:
Hey, let's make an appropriately small standard deviation a requirement for creating a Smitemaster's Selection hold!

That would highly disadvantage architects with holds that some people like, but others don't. Would you say that a hold like Bavato's Dungeon with a very large s.d. (and a large standard error) should disqualify an author simply because opinions on it are highly divided?

____________________________
Resident Medic/Mycologist

[Last edited by agaricus5 at 09-29-2006 11:20 PM]
09-29-2006 at 09:15 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
Tahnan
Level: Smitemaster
Avatar
Rank Points: 2460
Registered: 11-14-2005
IP: Logged
icon Re: Hold rating standard deviation (0)  
So: I really, really wanted to take statistics in college--I was a math major, and I needed one more course in my last semester--but unfortunately, it was being taught by the worst professor in the department.

Consequently, I fear that I have no real clue what the standard deviation really indicates. That is, I do know its technical meaning, but I don't know how to interpret it. To take the hold that's been the touchstone in all this: currently, the Underground Civilization has a rating of 8.2 and a sigma of 2.03. Which means that...most of the votes fall between 6.17 and 10.23? No, hold on, it means that people, er...

I have no idea what it means. Therefore, following the suggestions made independently by me in this thread and Oneiromancer in this thread, I'll formally request:

I'd like to be able to see the full voting results (as with polls) rather than just the standard deviation.
09-29-2006 at 09:49 PM
View Profile Send Private Message to User Show all user's posts High Scores This architect's holds Quote Reply
jbluestein
Level: Smitemaster
Avatar
Rank Points: 1670
Registered: 12-23-2005
IP: Logged
icon Re: Hold rating standard deviation (0)  
agaricus5 wrote:
jbluestein wrote:
Hey, let's make an appropriately small standard deviation a requirement for creating a Smitemaster's Selection hold!

That would highly disadvantage architects with holds that some people like, but other's don't. Would you say that a hold like Bavato's Dungeon with a very large s.d. (and a large standard error) should disqualify an author simply because opinions on it are highly divided?

Sorry...I forget that my sense of humor doesn't translate very effectively into type. I was just joking -- I have no interest in further restricting who can produce Smitemaster's Selections.

Although I should note that after being stuck on Level 3 of Bavato's Dungeon for, ummm, months, anything that blocks the evil designer of that hold (whoever he is) from producing more holds is a good thing in my book.

Click here to view the secret text


____________________________
"Rings and knots of joy and grief, all interlaced and locking." --William Buck
09-29-2006 at 09:50 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
Oneiromancer
Level: Legendary Smitemaster
Avatar
Rank Points: 2936
Registered: 03-29-2003
IP: Logged
icon Re: Hold rating standard deviation (+1)  
Tahnan wrote:
Consequently, I fear that I have no real clue what the standard deviation really indicates. That is, I do know its technical meaning, but I don't know how to interpret it. To take the hold that's been the touchstone in all this: currently, the Underground Civilization has a rating of 8.2 and a sigma of 2.03. Which means that...most of the votes fall between 6.17 and 10.23? No, hold on, it means that people, er...
For a perfect bell curve, 68% of the observations are located within 1 standard deviation of the mean (that is, 1 standard deviation on either side). 95% are located within 2 standard deviations, and 99% are within 3. When you have a skewed distribution like with your example, where the mean is close to the maximum possible value, you can't think of it as a perfect bell curve any longer. You can still think of it as a certain % of observations are within a number of standard deviations of the mean, but they will not be evenly distributed on each side.

The Wikipedia article on standard deviations isn't too bad if you have some math background. The section on "geometric interpretation" is the best to me because I am a very visual person when it comes to math or statistics.

Game on,

____________________________
"He who is certain he knows the ending of things when he is only beginning them is either extremely wise or extremely foolish; no matter which is true, he is certainly an unhappy man, for he has put a knife in the heart of wonder." -- Tad Williams
09-29-2006 at 10:47 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
agaricus5
Level: Smitemaster
Rank Points: 1838
Registered: 02-04-2003
IP: Logged
icon Re: Hold rating standard deviation (0)  
Tahnan wrote:
So: I really, really wanted to take statistics in college--I was a math major, and I needed one more course in my last semester--but unfortunately, it was being taught by the worst professor in the department.

Consequently, I fear that I have no real clue what the standard deviation really indicates. That is, I do know its technical meaning, but I don't know how to interpret it. To take the hold that's been the touchstone in all this: currently, the Underground Civilization has a rating of 8.2 and a sigma of 2.03. Which means that...most of the votes fall between 6.17 and 10.23? No, hold on, it means that people, er...

I have no idea what it means. Therefore, following the suggestions made independently by me in this thread and Oneiromancer in this thread, I'll formally request:

I'd like to be able to see the full voting results (as with polls) rather than just the standard deviation.
I agree with you there, the actual distribution will tell you far more, but maybe the post I linked to earlier with a bit of statistical analysis may prove useful to you while we don't have this yet.

Since this is the topic for it here, I'll quote it again:

I wrote:
Just as a bit of information, the statistics of the scoring system are probably quite interesting with lots of votes. The distribution of scores is highly likely to be non-normal, probably bi-modal (with the smaller peak near the low end), and also probably skewed well over towards the top.

Basically, the graph showing the number of votes for each score is likely not to be bell-shaped like the Normal Distribution, has two peaks, not one, with a small peak near the lower score end, and a bigger one around the mean, and the right side of the second peak (i.e. the number of people who voted 9 and 10) is likely to be reasonably high compared to the peak.

Although the distributions are not normal, there is a theorem called the Central Limit Theorem that states roughly that for any distribution with a mean, (μ), and finite standard deviation (σ), if you pick at random a certain number of people or things from it (in this case, you could pick a number of DRODders at random and ask them what they would vote), and average the numbers, you'll find that if you did it over and over many times, the average mean you'd get will be μ, and the means would be distributed approximately normally (i.e. the graph of means would be roughly bell-shaped), with a standard deviation of σ/√n, where n is the number of things you picked.

So, for example, Bavato's Dungeon, with 45 votes, has a mean score of 8.8, and a standard deviation of 2.13.
I expect that because the mean is near 9, that the peak is at around 9 (i.e. the largest number of votes were 9s), and that there are also quite a few 10s as well. The standard deviation is quite large, meaning the scores are probably very widely spaced. So, I expect there to be many 8s, fewer 7s, and probably almost zero 6s, with a small peak between 3-5, and maybe one or two 1s and 2s.

If I were then to do a random experiment by picking 45 hapless DRODers at random from the population and ask them to vote on Bavato's Dungeon, and do this many many times (let's say, 100 times), if the 45 votes the hold has now are representative of what people think, then...

I would find the average scores for the groups of people will probably have a mean of something near 8.8. I would also find the standard deviation of these averages to to be about 0.318, and the average scores put onto a graph would probably look roughly like a bell-shaped Normal curve. As nearly 95% of all the values will probably lie between 2 standard deviations of the mean, I can be probably 95% sure that the true mean (i.e. the mean I'd get if I made everyone vote, including those who will play DROD in the future) lies between 8.16 and 9.34.

For another example, HIJK has 22 votes, a μ of 8.8 and a σ of 0.89. The σ is much smaller, and so consequently, probably most people voted between 7 and 10, with very few low votes. Assuming the current votes are representative of everyone, we can be 95% sure that the true mean rating for the hold is between 8.42 and 9.18. Basically, it is more likely that the average rating for HIJK is around 8.8, while the average rating for Bavato's Dungeon is more likely to move either up or down, rather than remain at 8.8 (it's probably more likely to fall, given that it has already fallen over the last 9 months).

____________________________
Resident Medic/Mycologist

[Last edited by agaricus5 at 09-29-2006 11:21 PM]
09-29-2006 at 11:20 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
zex20913
Level: Smitemaster
Avatar
Rank Points: 1723
Registered: 02-04-2003
IP: Logged
icon Re: Hold rating standard deviation (+2)  
I'm a math person too, but I like to try to put things in Layman's terms for non-math people.

Basically, standard deviation tells you how close everything is to the mean. A standard deviation of 0 means the only data submissions were the mean. Low standard deviations mean people rated near the mean, and high means that the votes (in this case) were scattered.

So a 8.8 avg score with 2 as its standard deviation has many more scattered votes (in both directions) than an 8.8 with .4 as its standard deviation.

____________________________
Click here to view the secret text

09-29-2006 at 11:36 PM
View Profile Send Private Message to User Send Email to User Show all user's posts Quote Reply
coppro
Level: Smitemaster
Rank Points: 1309
Registered: 11-24-2005
IP: Logged
icon Re: Hold rating standard deviation (0)  
And as for what agaricus said, basically, if you have a distribution that isn't bell-shaped (like our holds when they have a high rating, because it's easier to rate farther from the average in one direction than the other), if you take a random sample of it and average it, (so take, say, a random 10 ratings and average them) and you keep doing this, you do get a bell distribution of the averages.

So, to summarize the summary, even if the ratings aren't distributed normally, they still reflect the true rating, and the standard deviation is also accurate.

To summarize the summary of the summary, the ratings system works fine.

To summarize the summay of the summary of the summary, people are stupid.
09-30-2006 at 12:35 AM
View Profile Show all user's posts Quote Reply
Tahnan
Level: Smitemaster
Avatar
Rank Points: 2460
Registered: 11-14-2005
IP: Logged
icon Re: Hold rating standard deviation (0)  
Right, yes, y'all explained what a standard deviation is, and like I said, that part I knew. It's just that it's so much easier to just look at the numbers and say, "Oh, huh, there's this one person who gave it a 2, but everything else is in the 8-10 range." Put another way: a small stdev suggests that everyone voted similarly; but a large one tells you only that the votes were somewhat scattered, but doesn't tell you how they were scattered. And that's what it would be nice to know.
09-30-2006 at 06:44 AM
View Profile Send Private Message to User Show all user's posts High Scores This architect's holds Quote Reply
Banjooie
Level: Smitemaster
Avatar
Rank Points: 1645
Registered: 12-12-2004
IP: Logged
icon Re: Hold rating standard deviation (+1)  
Answer: Make the little standard deviation letter clickable, opening up exactly what votes were given to the hold.
09-30-2006 at 11:01 AM
View Profile Send Private Message to User Show all user's posts This architect's holds Quote Reply
michthro
Level: Smitemaestro
Rank Points: 679
Registered: 05-01-2005
IP: Logged
icon Re: Hold rating standard deviation (+1)  
agaricus5 wrote:
Although the distributions are not normal, there is a theorem called the Central Limit Theorem that states roughly that for any distribution with a mean, (μ;), and finite standard deviation (σ;), if you pick at random a certain number of people or things from it (in this case, you could pick a number of DRODders at random and ask them what they would vote), and average the numbers, you'll find that if you did it over and over many times, the average mean you'd get will be μ, and the means would be distributed approximately normally (i.e. the graph of means would be roughly bell-shaped), with a standard deviation of σ/√n, where n is the number of things you picked.
Seeing a reference to the Central Limit Theorem, I can't resist saying something. Let me first say that although the std deviation does give some useful information, I'd also prefer seeing the actual distribution, but it's not really important to me. The rest is slightly off-topic.

Let me also say in advance that if this turns into a normality rant, it's not aimed at you, agaricus5. I'm merely interested in the topic, and I know you did yourself indicate that there may be some shaky assumptions in your arguments. This is purely a discussion on a topic you mentioned, not a response to what you said.

Click here to view the secret text
...if the 45 votes the hold has now are representative of what people think, then...
Rather a big if? I'm not sure. I suppose 45 probably is getting there, but the bigger question is whether those who do vote form a biased or representative sample. (Another reason for Schik to yell at us to vote.)
I can be probably 95% sure that the true mean (i.e. the mean I'd get if I made everyone vote, including those who will play DROD in the future) lies between 8.16 and 9.34.
There I disagree. The part about future voters, I mean. Remove that and I'm with you. That future voters will vote the same way as current voters seems like an unjustified assumption to me. If anything, BD and your next example, HIJK, show that there is a tendency for old holds to be rated lower over time, so you can't construct confidence intervals for BD's rating a year from now based on current votes. If that assumption were valid, you'd be right, of course, otherwise you're entering the world of time-series analysis.
09-30-2006 at 02:51 PM
View Profile Send Private Message to User Show all user's posts Quote Reply
agaricus5
Level: Smitemaster
Rank Points: 1838
Registered: 02-04-2003
IP: Logged
icon Re: Hold rating standard deviation (0)  
michthro wrote:
Let me also say in advance that if this turns into a normality rant, it's not aimed at you, agaricus5. I'm merely interested in the topic, and I know you did yourself indicate that there may be some shaky assumptions in your arguments. This is purely a discussion on a topic you mentioned, not a response to what you said.

...
That's an interesting rant indeed. I definitely agree with you that the idea of a limit thingy is only really accurate at that limit (i.e. at infinity), and especially in statistics, with random fluctuations, you could be incredibly unlucky and happen to get a sample of means that's way removed from the true distribution of means you're supposed to get.

For instance, we should expect hold ratings to be normally distributed, since ratings tend to be a sum of various criteria. Rubbish. Btw, I'm not accusing you of this or the "means are normal" assumption, agaricus5.)
Agreed once more. In fact, hold ratings cannot possibly be normal because they are discrete, and can only be between 1 and 10.

...if the 45 votes the hold has now are representative of what people think, then...
Rather a big if? I'm not sure. I suppose 45 probably is getting there, but the bigger question is whether those who do vote form a biased or representative sample. (Another reason for Schik to yell at us to vote.)
Again, I will state that's a big assumption. However, seeing as 45 is currently the largest number of votes for any proper hold, and you can only work with what you have, I can't really do any better. Of course, the bias is a problem that needs to be accounted for, but without surveying people (and even there, there will be bias in such a subjective kind of survey), I can't do anything about it either, except ignore it.
I can be probably 95% sure that the true mean (i.e. the mean I'd get if I made everyone vote, including those who will play DROD in the future) lies between 8.16 and 9.34.
There I disagree. The part about future voters, I mean. Remove that and I'm with you. That future voters will vote the same way as current voters seems like an unjustified assumption to me. If anything, BD and your next example, HIJK, show that there is a tendency for old holds to be rated lower over time, so you can't construct confidence intervals for BD's rating a year from now based on current votes. If that assumption were valid, you'd be right, of course, otherwise you're entering the world of time-series analysis.
Again, I am assuming with what I have. Time is another dimension I am very aware of, and as I mentioned in the other thread about hold ratings, I know that ratings can change over time quite a lot. However, the analysis was purely for the purposes of working out what the s.d. was actually useful for, and given that I do not have a time machine, I cannot actually do any analysis without assuming a time-freeze. The idea I had in mind was that if everyone who will play DROD were to magically play B.D now and vote, the mean for any sample of 45 people would be likely to be between those numbers in the confidence interval. I realise that it is not going to be accurate in 1 year, but I wanted to make the point about the present by ignoring the future, since that's what's important right now.

Of course, I could put a more explicit assumption warning in my analysis if you'd like.

____________________________
Resident Medic/Mycologist

[Last edited by agaricus5 at 10-02-2006 12:40 PM]
10-02-2006 at 12:39 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
New Topic New Poll Post Reply
Caravel Forum : Caravel Boards : The Site : Hold rating standard deviation
Surf To:


Forum Rules:
Can I post a new topic? No
Can I reply? No
Can I read? Yes
HTML Enabled? No
UBBC Enabled? Yes
Words Filter Enable? No

Contact Us | CaravelGames.com

Powered by: tForum tForumHacks Edition b0.98.8
Originally created by Toan Huynh (Copyright © 2000)
Enhanced by the tForumHacks team and the Caravel team.