Announcement: Be excellent to each other.


Caravel Forum : Caravel Boards : General : Some thoughts on "difficulty"
New Topic New Poll Post Reply
Poster Message
Insoluble
Level: Smitemaster
Avatar
Rank Points: 1639
Registered: 09-04-2014
IP: Logged
icon Some thoughts on "difficulty" (+7)  
I’ve been thinking a lot recently about difficulty in DROD. There has been talk over the years about how we rate difficulty in holds and how we talk about difficulty, and I realize there are other threads that touch on this topic. But the issue of difficulty and difficulty ratings came up in the chat today and I was hoping to get continue the conversation. There seems to be a lot of agreement that difficulty is not really on a linear scale and that seems to bare out when you look at hold ratings:
Click here to view the secret text

Some thoughts were bandied about based on this. One suggestion was to replace our current system having 20 distinct ratings with one having just 5. Another suggestion was to replace the current numeric scale with a more qualitative one. These are both interesting ideas and I'd love to hear more thoughts on them.

Another thing that I’m starting to realize is that difficulty is definitely not a one dimensional measure. There are two broad categories of difficulty that I can see: Conceptual difficulty (i.e. your classic linchpin puzzles) and executional difficulty (i.e. difficult manipulation, hack & slash). One easy way to see think about the distinction is that conceptually difficult puzzles are ones that you could potentially “solve” without even having DROD loaded up in front of you. (I solved “That Room” in Lemming Beach in my sleep one night). These are puzzles that, once you know the solution, it’s easy to repeat it later on. Executional difficulty refers to the kind of puzzle where you know what tasks you need to accomplish, but the actual keystrokes and movements to get the job done are difficult to come up with. These kinds of rooms are often still difficult when you re-play through them.

Now I don't necessarily advocate replacing our current one dimensional difficulty scale with a multi dimensional one. That would probably be too messy. But it is an interesting thing to think about, and particularly relevant for taking into account when typing up a hold review. I also think that there is probably a bit of a continuum between these two types of difficulty, but it definitely seems like an accepted categorization by the DROD community at this point. Each kind of difficulty also seems to have subcategories that come up pretty frequently.

Conceptual (Lynchpin)
Edge case exploration: This involves exploiting some of the more arcane rules and edge cases of the DROD rule set. Once you know the specific rule or behavior, the task becomes easy.
Complex sequence of tasks: This type of difficulty involves figuring out the correct way to complete a sequence of tasks in a room. This could mean figuring out which tools to use for which tasks, and figuring out what order to complete the tasks in.
Efficiency (Getting the maximal amount of use from each element.): This kind of difficulty involves making use of the same game element in multiple different ways, and figuring out how to get the most use out of each element. You typically see this in a room where, for instance, if you just happened to have one extra powder keg the solution would be easy.

Executional
Horde management: Probably the most obvious type of difficulty in the execution category.
Individual monster manipulation: This type of difficulty involves finding a somewhat precise sequence of moves that will get monsters to do your bidding. You frequently see this with goblins and serpents, but it can apply to pretty much any monster type as well as mimics and other things.
Efficiency (execution with low move count): This is typically found in rooms with timers that require you to get the job done in as few moves as possible.

I'd be super curious to hear other subcategories of difficulty that people have identified.

____________________________
Links to neat forum tools that I always have trouble finding:
Click here to view the secret text

05-04-2016 at 08:31 PM
View Profile Send Private Message to User Send Email to User Show all user's posts This architect's holds Quote Reply
skell
Level: Legendary Smitemaster
Avatar
Rank Points: 3734
Registered: 12-28-2004
IP: Logged
icon Re: Some thoughts on "difficulty" (+1)  
Have some unstructured thoughts from yours truly.

An objective scale is like a halting problem - you can't find a solution for it. No matter what you do, you can't make it that everyone's definition of some rating is the same. What I think I have is a three step solution which does not solve the "objectively useful difficulty scale" but rather treats it as an XY problem, wherein the actual problem we're trying to fix is "when I look at this hold I want to know if I am likely to struggle with it".

Step 1: Streamlining rating definitions
Right now my definition of 10 or 5 or 8 or whatever brain hold is probably very much different from other people's definition. I propose a verbal difficulty ratings, ranging from easiest to hardest:

1 - I felt absolutely no challenge
2 - I felt challenged
3 - I barely solved this without help
4 - So hard I needed help
5 - So hard help wasn't be helpful

Let's have the community vote on the holds a little, and then you can see it like this:

50% of players found this challenging, or
80% of players felt it was so hard they needed help

Now *this* is actually much more meaningful. You don't know how hard the hold will be to you but at least you have a pretty understandable and consistent information how on average people found the difficulty. We're getting somewhere.

But it gets better! If you play a few holds yourself you'll start to see how well you fare against the average and can then figure out how hard something is likely to be... wait a second...

Step 2: Automated comparison system
The forum can do it for me! It can look at my voting trends, compare it with average and voila, we have immediate feedback:

On average players found it so hard they needed help, but with your skills it should be easier

Damn, I feel good now. The forum appreciates how awesome I am. Now I am sure there are players out there with similar skills than me, after all there are 8 billion people out there, we can't all be unique, right?

Step 3: Comparing against similar players
Since we know how everyone votes we can easily find people who rate the difficulty like you. So the forum can find such people, see how they rated the hold.

On average players found it so hard they needed help, but with your skills it should be easier.
- skell (100% compatibility) said: So hard help wasn't be helpful
- mxvladi (80% compatibility) said: I felt challenged


How cool is that? How cool I am to come up with this solution?

Step 4: Revenge of hidden step
While we're talking about it, we could steal the ideas from other sites and allow tagging for holds, although for big holds like Gigantic Jewel Lost or any main-game hold most tags will make little sense because, well, they cover a whole lot of various things. Saying that TSS is tagged with serpent manipulation, roach manipulation, goblins, constructs, kegs, time clones, advanced time clones, <snip> and lynchpins is as good as saying "almost everything".

____________________________
My website | Facebook | Twitter

[Last edited by skell at 05-04-2016 10:55 PM]
05-04-2016 at 10:22 PM
View Profile Send Private Message to User Send Email to User Visit Homepage Show all user's posts High Scores This architect's holds Quote Reply
ErikH2000
Level: Legendary Smitemaster
Avatar
Rank Points: 2794
Registered: 02-04-2003
IP: Logged
icon Re: Some thoughts on "difficulty" (0)  
I am laughing at the phrase, "so hard that help won't be helpful".

That is HARD.

-Erik

____________________________
The Godkiller - Chapter 1 available now on Steam. It's a DROD-like puzzle adventure game.
dev journals | twitch stream | youtube archive (NSFW)
05-04-2016 at 10:26 PM
View Profile Send Email to User Show all user's posts This architect's holds Quote Reply
uncopy2002
Level: Smiter
Rank Points: 431
Registered: 07-28-2014
IP: Logged
icon Re: Some thoughts on "difficulty" (0)  
Before we can discuss any "difficulty", we need to consider that there are two types of difficulty scale: end-user friendly difficulty, and critic friendly difficulty. Almost all places uses the former because the end result are usually a score or a true/false measure, which is very simple and easy to grok for the majority of the users. Critics like me don't give scores, we judge the work as the work itself; if you want to grok this then you have to eat up all the details we're telling to you, which most people just goes tl;dr and escape from the situation.

The above is the underlying conflict in this discussion.

Then, given that we know we're already reducing the latter (a complicated criticism) to former (a simple score), I'd pretty much say that the distribution is... actually quite good? I mean, even though it's not perfectly normally distributed, there are two important and good things: kurtosis is quite small IMO, and not every difficult/good hold has its difficulty/score degenerate at the highest range, which happens like all the time at most other sites. Also, I think quite a lot of the low scores are caused by the long tail of bad/mediocre holds back in JtRH era. If you only count for, say, TCB and later holds, the distribution should look better. In fact, I think it might skew to the opposite direction.

Besides, we also need to consider that user-made holds are user-made - and hence the general characteristic of user-made levels applies. Especially, from my experience it's almost always the case that user-made holds are always harder, or trickier, and never the opposite. Having more difficult holds (in number, not hardness), even more so than official ones, are a perfectly normal and anticipate outcome. Once you give the player the crayons and the instruction manual, they don't draw interactive manuals for said crayons.

[Last edited by uncopy2002 at 05-04-2016 10:56 PM]
05-04-2016 at 10:50 PM
View Profile Send Private Message to User Show all user's posts Quote Reply
uncopy2002
Level: Smiter
Rank Points: 431
Registered: 07-28-2014
IP: Logged
icon Re: Some thoughts on "difficulty" (0)  
I'll also comment on the different suggestions, in a seperate post because it's a different matter:

Individual monster manipulation: This type of difficulty involves finding a somewhat precise sequence of moves that will get monsters to do your bidding. You frequently see this with goblins and serpents, but it can apply to pretty much any monster type as well as mimics and other things.

The idea is good, but collapsing them into a same spot is probably a bad idea - there are tons of monsters in DROD, and most of them behave distinctly differently. For example, most people are just terrible with wraithwing horde manipulation. You could be good at brained hordes but absolutely terrible at serpent and goblins. And the list goes on.

1 - I felt absolutely no challenge
2 - I felt challenged
3 - I barely solved this without help
4 - So hard I needed help
5 - So hard help wasn't be helpful

This system has one big intrinsic problem - it's not judging the difficulty of the hold, it's judging the interaction between the player's skill and the difficulty of the hold, at the time where the rating is made. A player's skill has all sorts of short-term fluctuations and long-term trends that complicates the matter to no end, unless we require each player to improvise their hold ratings every month (which, obviously, doesn't help much).

we could steal the ideas from other sites and allow tagging for holds

This is a good idea. And in fact one of the (presumably planned) features of the idea generator is to link rooms to concept:

Linking rooms (options not yet available) - if you know a room in a published hold which utilizes this concept you can link this room to the concept. Linked rooms are clickable to view how they look.



By the way, I should also point out an elephant in the room - the statistics for most of the holds are seriously lacking. I counted the number of votes for all the holds published after 2013, and most of of them have less than 10 ratings. Nearly all of them are under 15. Beethro and the cake has the most votes, and that just 27. Basic statistics tells us that we need at least 30 for the statistics to be minimally confident (Central Limit Theorem).

[Last edited by uncopy2002 at 05-04-2016 11:34 PM]
05-04-2016 at 11:33 PM
View Profile Send Private Message to User Show all user's posts Quote Reply
skell
Level: Legendary Smitemaster
Avatar
Rank Points: 3734
Registered: 12-28-2004
IP: Logged
icon Re: Some thoughts on "difficulty" (+1)  
uncopy2002 wrote:
Individual monster manipulation: This type of difficulty involves finding a somewhat precise sequence of moves that will get monsters to do your bidding. You frequently see this with goblins and serpents, but it can apply to pretty much any monster type as well as mimics and other things.
The idea is good, but collapsing them into a same spot is probably a bad idea - there are tons of monsters in DROD, and most of them behave distinctly differently. For example, most people are just terrible with wraithwing horde manipulation. You could be good at brained hordes but absolutely terrible at serpent and goblins. And the list goes on.
You could say the same about movies - there are different things people enjoy in movies, some like special effects, others like acting, plot, backgrounds etc, yet it doesn't suddenly invalidate the value of those ratings. It literally works for everything, even restaurant ratings. Heck, it works right now, the current difficulty rating system is at least good enough. Not to mention unlike movies or games or restaurants you're not committed to the hold you chose, if you somehow find that the rating does not work for you, you stop playing.
uncopy2002 wrote:
1 - I felt absolutely no challenge
2 - I felt challenged
3 - I barely solved this without help
4 - So hard I needed help
5 - So hard help wasn't be helpful

This system has one big intrinsic problem - it's not judging the difficulty of the hold, it's judging the interaction between the player's skill and the difficulty of the hold, at the time where the rating is made. A player's skill has all sorts of short-term fluctuations and long-term trends that complicates the matter to no end, unless we require each player to improvise their hold ratings every month (which, obviously, doesn't help much).
Let me do another analogy to movies. There are a lot of critically panned movies which gained cult following or are widely loved by general, less objective populace. Heck, even critics who should be objective can't agree on a single thing. Take any decent movie you want and I can guarantee you'll find prolific critics who have at least two opposing opinions about something like given actor's acting abilities. What I am getting at is it's impossible to judge a hold separately from the player's skill, because, correct me if I am wrong, you can't come up with any absolute metric for that. There will always be a bunch of people who'll think otherwise. And sometimes you're going to be way wrong and majority of people will disagree, a situation that won't arise if you average people's feelings.
Of course there is also the issue of people getting better over time and that's not something you can fix. But trying to give an objective rating will suffer from a similar problem - changing standards. Things that used to be hard are not hard because we got new tools, better holds teach players abilities faster etc.

uncopy2002 wrote:
By the way, I should also point out an elephant in the room - the statistics for most of the holds are seriously lacking. I counted the number of votes for all the holds published after 2013, and most of of them have less than 10 ratings. Nearly all of them are under 15. Beethro and the cake has the most votes, and that just 27. Basic statistics tells us that we need at least 30 for the statistics to be minimally confident (Central Limit Theorem).
I'll be nitpicky but it is not an elephant in the room, though it is a genuine issue. I don't think we need to worry about confidence because our statistics are just for entertainment, not for someone's scientific work.

Now before this goes too far, I personally think the current system works absolutely fine considering the number of players we have and the stream of new holds.
And I am not completely discarding the idea of having critics do elaborate reviews of holds, but it suffers from the same and/or different problems than using user's averages.

____________________________
My website | Facebook | Twitter
05-05-2016 at 12:14 AM
View Profile Send Private Message to User Send Email to User Visit Homepage Show all user's posts High Scores This architect's holds Quote Reply
uncopy2002
Level: Smiter
Rank Points: 431
Registered: 07-28-2014
IP: Logged
icon Re: Some thoughts on "difficulty" (0)  
You need to consider the last point with your point in conjunction:

You want to make an algorithm that makes suggestions based on how you rated the holds, and how others do.
But your statistics are very inadequate.

This would be a good idea, if only you have as much data as a big data company does.
And the fact is, most holds have very few scores, and people who gave a lot of ratings are even less (probably no more than 50).
So I can't see there's any way this can produce any fairly good or accurate result.


Regarding to your other points, yeah, nothing's perfect. But my point is that making new standards for ratings aren't necessarily better, and making things more complicated to everyone's expense while it doesn't help a lot doesn't sound like something good to be actually performed.
05-05-2016 at 08:48 AM
View Profile Send Private Message to User Show all user's posts Quote Reply
skell
Level: Legendary Smitemaster
Avatar
Rank Points: 3734
Registered: 12-28-2004
IP: Logged
icon Re: Some thoughts on "difficulty" (+1)  
Maybe you'll disagree with me, and that would explain why we can't find common ground, but I think the current system works perfectly for what it is supposed to achieve. If you start with this assumption you see that you don't need big data, even 5 difficulty ratings on a hold are a very decent estimate on what you can expect.

Let's skip step #1 because it's just giving a common context to the numbers and is completely unnecessary, though it's my personal belief that a verbal scale is much more useful than bare numbers as it allows for using the ratings in consistent way.

What step #2 is about? Looking at your historic data and comparing it with other historic data, something you do already, in your head and less precisely. The results will be no less accurate than what everyone is already doing subconsciously.

What step #3 is about? Profiling you against other players who vote similarly which works regardless of how many votes given hold as, as long as people who're profiled like you have voted on that hold.

The point is you don't need high confidence for things which are as unimportant as hold difficulty ratings and the changes I proposed are almost exclusively additions allowing to use the same data in new ways to give players more options, not to increase complexity of the existing ones, because you can still use the old numeric rating in the same way, completely ignoring the rest.

So I can't see there's any way this can produce any fairly good or accurate result.
To reiterate my statements, #2 allows you to improve what players are already doing in their head. #3 does not need a lot of data, because even if I were to display "skell (3% compatibility) rated this hold: 10" it tells you a lot. You could even express compatibility with rating above and below, like "skell (+85% =5% -10%) rated this hold: 10" you know that skell is a worse player than you. There's a lot of design space for using small data if it's used only for entertainment.

But my point is that making new standards for ratings aren't necessarily better,
I agree, the change of the base rating system is not required and I would not attempt to do it, but if I were designing a new system from scratch for something I'd certainly go with it.

If at this point you don't at least partially agree with me I think we'll have to agree to disagree to not monopolize the thread with discussing my stuff as there are many, many other systems to talk about.

____________________________
My website | Facebook | Twitter
05-05-2016 at 10:00 AM
View Profile Send Private Message to User Send Email to User Visit Homepage Show all user's posts High Scores This architect's holds Quote Reply
enzi666
Level: Master Delver
Avatar
Rank Points: 161
Registered: 01-05-2004
IP: Logged
icon Re: Some thoughts on "difficulty" (+2)  
What I want to know from a difficuly curve is, will I be overwhelmed by the difficulty or not.
In the case of DROD, should I play other holds before?

Extreme example, if you're not comfortable with goblin movement and you want to play a hold that expects goblin manipulation you'll have a hard time. And the only solution to this really is don't play it, start with KDD or something. The hold on the other hand might be quite easy, a 5.0 for veterans.

We have cool words for different kind of puzzles so I'm thinking about a tagging system. It has the potential of guiding players and serving as a kind of unlock system for gained knowledge about the game and it's mechanics.

Especially KDD is perfect for this as each level has it's own puzzle theme. If you finish the hold/level you probably have a good understanding of those elements and you "unlocked" the tags. We can then have holds that have these tags as prerequisite. So the hold I mentioned before, now has a "advanced goblin manipulation" prerequisite. It doesn't stop you from playing it but it gives a good indication of what to expect.

The whole thing can turn into a game itself. Gotta catch those tags and learn everything what DROD has to offer. Spoileralert: A lot!

____________________________
58th Skywatcher

DROD AE: Finished - KDD2: Mastered
JTRH: Mastered
TCB: Mastered - RPG: Finished
GatEB: Mastered - TSS: Mastered
05-10-2016 at 04:34 AM
View Profile Send Private Message to User Send Email to User Show all user's posts Quote Reply
New Topic New Poll Post Reply
Caravel Forum : Caravel Boards : General : Some thoughts on "difficulty"
Surf To:


Forum Rules:
Can I post a new topic? No
Can I reply? No
Can I read? Yes
HTML Enabled? No
UBBC Enabled? Yes
Words Filter Enable? No

Contact Us | CaravelGames.com

Powered by: tForum tForumHacks Edition b0.98.8
Originally created by Toan Huynh (Copyright © 2000)
Enhanced by the tForumHacks team and the Caravel team.