Pursuing the optimal solutions in DROD RPG

Announcement: Be excellent to each other.

Caravel Forum : Caravel Boards : General : Pursuing the optimal solutions in DROD RPG

Poster

Message

Meta
Level: Master Delver
Rank Points: 106
Registered: 12-06-2016
IP: Logged

File: TLP.png (115.9 KB)
Downloaded 84 times.
License: Public Domain

Pursuing the optimal solutions in DROD RPG (+13)

I Background
------------

I am terrible at playing DROD RPG. Even for the published holds that are rated as very easy, I can barely conquer them. After finishing some holds and looking at the high scores, I almost convinced myself that those scores are not legit, since they looked impossibly high to reach. This motivates me to figure out the optimal solutions, not just the highest known score on the leaderboard, so that I can prove that those scores are fake. DROD is open source after all, and it should not be too hard to create a fake demo. (Later, I discovered that many of those scores are legit, and not yet optimal.)

I got distracted quickly, until Mike announced the Twisty Little Passages (TLP) late 2018. The puzzles there are claimed to have essentially unique solutions. I did not want to spend much time verifying the claim and solving the puzzles myself. So, I wrote a TLP solver, whose front end is a puzzle editor written in Python (together with some hand-drawn tile set to avoid possible copyright issue), and whose back end is a puzzle solver written in C++. Although the algorithm used is a simple brute force searching, it worked perfect on the demos of TLP, and proved that the solutions are essentially unique.

The success encouraged me to try it on some easy DROD RPG holds. Not surprisingly, my naive program did better than me, but still far worse than the high scores. The main reason was that I only spent half an hour to write the program, and I wrote a lot of dirty ad-hoc codes. The program was very slow, and both CPU and memory hungry. My old laptop couldnot handle it.

Then, I put it aside for a while. When TLP collected shipping information, I recalled that I had written such a program. I decided to invest some time in doing it better. I managed to get several optimal solutions this time, and I would like to share some information with you in this article.

In Section II, we will analyze the optimization problems, together with its decision counterparts, and their complexities. In Section III, some algorithms used in practice are shown. In Section IV, some potential issues are discussed. In Appendix, we will list the holds that we have tried, and show some tricks and how we use the algorithm to optimize them.

II Complexity
-------------

In DROD RPG, we have a player, who has some stats (for example, HP, ATK, YK). A hold consists of many rooms which are connected with each other in certain ways, and each room consists of some resources (for example, potions, gems, or even monsters). The player wants to go from one room, called entrance, to another room, called exit, without ever running out of resources. Later, we will simply refer this as the puzzle.

Note that the rooms here are referring to an abstract concept of the rooms in DROD. They are more like the corridors in DROD. We assume the following axioms:
* if the player decides to visit a room, then the player must gather every resource in the room in a predescribed order (for example, yellow door, monster, then yellow keys);
* the player can only visit a room that is adjacent to a visited room;
* the player can only visit a room if the visited rooms satisfy some predescribed property (this allows the puzzle to create situations like choosing at most one from multiple rooms);
* a room cannot be visited twice.

These axioms alone are not enough to cover all the possible puzzles. We will use it as a base line, and add more mild axioms if necessary.

Let us consider one very simple class of puzzles, a puzzle consisting of only one kind of resource. For example, a hold consisting of only yellow keys and yellow doors (1 YD can be viewed as -1 YK), or a hold consists of only potions and monsters/hot tiles that deal only constant damages regardless the player's stats. In 2018 Year of RPG Compilation, the entry Suprematism of Health by Skell is such a hold.

The optimization problem asks us to find a way to go to the exit with largest amount of resources and without ever having a negative resource at any time. The decision counterpart is simply asking to find such a way from the entrance to the exit. These two problems are related each other, and we leave it as an exercise to the readers.

Prop. The puzzle decision problem is NP-Complete, and the puzzle optimization problem is NP-Hard.

Proof.
We view a room as a vertex weighted by an integer. Suppose that we have a (edge-)weighted graph G whose edge-weights are nonnegative integers. We create a vertex-weighted graph H consists of two type of vertices:
* Type I: vertices with large fixed positive integers (compared with the edge-weights of G), which represents the vertices of G,
* Type II: vertices with weights being the negation of the edge-weights of G, which represent the edges of G,
so that the graph H encodes the graph G. Put it in another way, the graph H is the subdivision of G such that exactly one vertex is added for each edge, and that the vertices are suitable weighted.

Now, considering the puzzle optimization for the graph H. Clearly, we need to visit all type I vertices, and we want to visit a small amount of Type II vertices so that the visited vertices are connected at all time, and the total weights of the visited vertices are as small as possible. Since the weights of Type I vertices are fixed, we only need to optimize the total weights of visited Type II vertices. If we translate this problem from graph H to graph G, we will see that we need to find a Steiner tree for the graph G.

It is well-known that the Steiner tree optimization problem is NP-Hard, and Steiner tree decision problem is NP-Complete. Therefore, the puzzle optimization problem and the puzzle decision problem are both NP-Hard.

On the other hand, if we have a way to go from the entrance to the exit, such a way is a certificate of polynomial length for the puzzle decision problem, hence the puzzle decision problem are in NP. Thus, the results follow.
Q.E.D.

This proposition implies that we should not expect any effective algorithm to find the optimal solutions, unless P = NP. But we still have some good news. If we only care about near optimal solutions, for instance, a 70% optimal solution (more precisely, 1 / log 4), then a polynomial time algorithm exists for Steiner tree problem. Therefore, we still have some hope to expect a polynomial time algorithm to solve the puzzle.

III Algorithm in practice
-------------------------

We first discuss algorithm of finding (global) optimal solutions, and later algorithms finding local optimal solutions.

III.1 Dynamic Programming
-------------------------

In many puzzles, we can see that the following additional axiom is satisfied:
* except for HP, the stats of the player after visiting a given set of rooms does not depend on the order of the rooms visited.

At the first glance, some common room does not satisfy the axiom, such as a room with a choice of either ATK gems or DEF gems. But, we can always construct the rooms (recall that the room is an abstract concept) in a ways that they satisfy the axiom and still resemble the effect we want. In the scenario mentioned above, we can create two rooms:
* Room A: ATK gems,
* Room B: DEF gems,
and require that:
* Room A can only be visited if Room B has not been,
* Room B can only be visited if Room A has not been.

Let us give another example here. Suppose that we have a hold with two levels so that we cannot backtrack from the second level the first level. Then, we can use the following constraints:
* Rooms in the first level can only be visited if none of the rooms in the second level have been visited.

With this additional axiom, given a specific set of given rooms, we only need to optimize the HP, since all other stats are already determined from the set itself. And the puzzle optimization problem reduces to HP optimization problem for all subsets of rooms that contain the exit. Therefore, it can be solved using dynamic programming.

In the worst case scenario, the time complexity is exponential. But, we already know that the puzzle optimization problem is NP-Hard, and we cannot hope a much better algorithm. In practice, there are many ways to speedup the dynamic programming.

III.2 Local Constraints
-----------------------

In many fields, we have some kind of local-global principle, that it, some global object is equivalent to all its corresponding compatible local objects. In this game, although we do not quite have an equivalence, at least we know that a global optimal solution should be a local optimal solution everywhere. By saying a solution is local optimal, I mean that it cannot be improve by making some "local" moves.

One such kind of move is simply changing the order in which the rooms are visited locally. Let us show an example here. Suppose that in a solution (i.e. a sequence of rooms), the player visit Room A at time 3, and visit Room B at time 6. A swap will let the player visit Room A at time 6 and visit Room B at time 3. If the solution is local optimal, then the solution after the swap should not be better than the original solution.

There are more kinds of local moves, and we will not discuss them here. From now on, we assume that we have some consense on the meaning of local moves.

It is quite easy to check if a solution is local optimal. It means that, given a solution, we can always try to improve it until it reaches a local optimal. The difficult part is how to get to the global optimal when we stuck at the local optimal. I do not have a good way, but I would try some deep neural network and see if it works.

Nevertheless, we have some algorithm that can be used to improve existing solutions, and the local constraints can be used to reduce the amount of states that are needed to be checked in the dynamic programming mentioned above.

IV Possible issues
------------------
In order to run any algorithms, we first need to model the holds in a way that the axioms are satisfied. If the number of the rooms is small, we do have a practical algorithm to determine the optimal solutions, and when that is large, at least we can get a local optimal solutions. So, if we can model the holds correctly and simplify the model, we have a high chance to get the optimal solutions in reasonable time.

On the other hand, I usually find in practice that a lot of mistakes could happen at the modeling and simplification step. If we overlook a trick needed, the algorithm can only give the optimal solutions under the assumption that the trick is not used. For example, it is quite easy to overlook possible door walkings.

The dynamic programming algorithm will use a lot of resources. We need to store information for a lot of states somewhere. We can use local constraints to reduce the number of states dramatically. I have not implemented many such constraints yet, and for many holds, my old laptop is very slow and does not have enough memory. I may ended up writing some better codes, and putting the data on the hard drives instead of the memory.

Appendix
--------
I will show below, for some small holds (ordered by hold ID), how I use the algorithm to optimize it, and possible some tricks used. For most of them, I either failed to conquer them myself, or only get a very low score myself. I am amazed by the strategies discovered by the program. Now I can see how a person who is good at this game will play, and I know for sure that the high scores on the leaderboard are real. If you want to know exactly how my program plays, just send me an email.

Hold 444: 67-th Grayman adventures by Slava
-------------------------------------------
The first score point has a unique solution. For the second score point, my program optimizes the amount of HP to take into 1N2W, and the order to pick up gems and fight spider.

Hold 484: Judy's Mansion by Edward
----------------------------------
Although it is rated as the easiest published hold, my program tells me that it is far from being easy to get the optimal solutions for all score points. For almost every score point, we need to do something different, especially for the one killing aumtlich due to the existence of aumtlich beam. One trick for this hold is to use door walking to get 2 yellow keys and 1 green key by killing only one monster.

Hold 643: Ten-Minutes Distraction by Suwako
-------------------------------------------
Get sword, then shield. Gather DEF first if possible.

Hold 669: 2016 Year of RPG Compilation: One by Bomber50
-------------------------------------------------------
This is a typical puzzle that can be solved easily by the algorithm.

Hold 669: 2016 Year of RPG Compilation: This or That by Bomber50
----------------------------------------------------------------
Due to its structure, we can speed up the algorithm dramatically and get the optimal solution in milliseconds, despite that it is quite long. I have no way to figure out the magic sequence myself.

Hold 669: 2016 Year of RPG Compilation: Escalation by Red-XIII
--------------------------------------------------------------
The puzzle is a bit long to use the algorithm directly. So, I have to apply the algorithm to various key steps. Since the multiplier is very high, applying the algorithm to key steps should not effect the final result much. By the way, I used the wall walking to get the gems once I have 800 GR (door walking in, wall walking out).

Hold 670: One Way by Red-XIII
-----------------------------
We want to preserve as many keys as possible, which means that it is not that easy to get back once we pass 6N. I used the algorithm to optimize before 6N, and after 6N, with reasonable simplification.

Hold 675: Rock Island by Kieran Millar
--------------------------------------
The most important step is to get 36 ATK. We can either go clockwise, or go counter-clockwise. One direction is far superior than the other. After making the initial moves, the algorithm takes care of the rest.

Hold 679: Get Your Stuff Back by Hyperme (first score point)
------------------------------------------------------------
One advantage of having a program is that we can test the best altar use without thinking about it ourselves.

Hold 705: Royal Rescue by Bomber50 (first score point)
------------------------------------------------------
We need to use door walking to get the shield.

Hold 708: Goblin King's Fortress by Kieran Millar (first score point)
---------------------------------------------------------------------
There are too many places where the pickaxe can be used. It turns out that we do not want to use the pickaxe at all.

Hold 711: Goblin Hollow by Bomber50
-----------------------------------
My algorithm teaches me that the priority is to get 20 DEF at 2N1W and 2N1E.

Hold 718: Throw by Xindaris
---------------------------
I am surprised to know that we want to kill Mud Baby at 2N1E first in order to get 24 ATK.

Hold 753: Minimal Design by Tio
-------------------------------
This hold is very complicated to model. I am not sure if the solution given by my program is optimal.

[Last edited by Meta at 01-09-2020 06:00 AM]

01-08-2020 at 07:00 AM

Quote Reply

kieranmillar
Level: Smitemaster
Rank Points: 3771
Registered: 07-11-2014
IP: Logged

Re: Pursuing for the optimal solutions in DROD RPG (+2)

Very cool and interesting. Given DROD RPG has no randomness I always knew it was ripe for someone to write a program to help optimise many of the scorepoints. The fact you've found a higher score on Throw is very impressive, I'd considered that one to be fully optimised. I myself have never written such a program and have achieved all my scores largely through a playstyle that relies largely on intuition (outside of my Optimisng RPG videos), with many of my 1st places largely coming from having no competition for a while. It's great to have new targets to aim towards and have it be proven once and for all what the actual highest scores for some of these are.

Regarding the legitimacy of the highscores, DROD RPG actually records your move sequence and when the score is sent to the spider, your moves are played back by the spider in order to verify that the score was obtained legitimately. So Caravel Net highscores that have been verified (those with a non-zero score) are almost all legitimate. The cases where they are not are ones where the leaderboards themselves have clearly had some sort of error, like the top scores for this scorepoint ( http://forum.caravelgames.com/rpgscores.php?action=pointscores&id=13155 ) quite clearly being for the wrong scorepoint. By comparison, the Steam leaderboards have no such verification, and got torn to pieces on day 1.

The sad news is that the spider is not open source code, and a hard drive failure knocked it out and took all of the sourcecode with it. So the RPG spider has been down for a year and might not be coming back. So the leaderboards have only been updating with technically unverified scores, although I personally have no doubt on the legitimacy of your and greenscience's impressive scores.

[Last edited by kieranmillar at 01-08-2020 11:15 PM]

01-08-2020 at 11:08 PM

Quote Reply

greenscience
Level: Master Delver
Rank Points: 168
Registered: 02-27-2015
IP: Logged

Re: Pursuing the optimal solutions in DROD RPG (+3)

I had written a small program before to fully optimize the solution to 2016 Year of RPG Compilation: One but did not try to extend the approach to other holds since I knew the complexity would be exponential. My current approach is also largely intuitional using on heuristics for figuring out a locally optimal order for gems while picking up enough keys and health to progress. In addition I keep track of the player's status after collecting each resource on a spreadsheet so I could quickly check whether to swap the order in which I obtain two resources, estimate whether collecting a certain gem saves enough health to cover the cost of its keys, or figure out when to leave behind potions to cross hot tiles or Aumtlich beams.

Accessories and scripting can create tricks that are especially interesting for optimization. With regards to accessories, 2018 Year of RPG Compilation: The Heist requires knowledge of more obscure features of accessories whereas Castle Repton encourages re-evaluating where to use accessories for each score point. Scripting-wise Growing has monsters whose stats grow over time, Washed Ashore has score points that reward other metrics like total moves, rotations, monsters killed/spared and several holds have moving monsters for more traditional DROD puzzles. Fetch the Pie might be the best example of the latter although I have not reached the moving monsters yet.

01-09-2020 at 08:53 PM

Quote Reply

mrimer
Level: Legendary Smitemaster

Rank Points: 5460
Registered: 02-04-2003
IP: Logged

Re: Pursuing the optimal solutions in DROD RPG (+2)

I just had some time to come here to check this topic out and give it the attention it deserves.

Impressive work!

I can relate to your initial reactions. When I first played Tower of the Sorcerer, I remember seeing how high some of the global scores were and thinking there was no way they were real. Over many years, I found that all (but possibly the top five or six places, which I still have no idea how to reach; I still wonder whether they only appeared due to a code bug or hack, but I digress) actually were, and that was a revelation. A very fun exploration into the world of optimization that has stuck with me all these years.

I'd love to hear how a deep net could learn to solve DROD RPG. My knee-jerk reaction is to expect there isn't enough data to train a huge ol' net on, which needs millions of training instances, but what do I know

Second, I'd be interested in what topology works best for an application like this. Probably ones similar to what was used for chess, or with Alpha Go? Or a more fluid one like for Starcraft?

[Last edited by mrimer at 02-01-2020 01:20 AM]

02-01-2020 at 01:16 AM

Quote Reply

greenscience
Level: Master Delver
Rank Points: 168
Registered: 02-27-2015
IP: Logged

Re: Pursuing the optimal solutions in DROD RPG (+3)

The situation with Tower of the Sorcerer global scores is a bit confusing. From looking at the Internet Archive, it looks like the leaderboard migrated from

http://hp.vector.co.jp/authors/VA013374/game/emtrank0.html

http://www7.atpages.jp/sorcerer/indexe.html

some time during 2008. This coincides with the release of a new version for TotS 3D which fixed bugs with merchants and updated save data format.

The top score in the old leaderboard were several points higher than the top score in the new leaderboard. Most people did not resubmit scores for the new leaderboard but those who did received lower scores, suggesting that they relied on bugs or features exclusive to the previous TotS version. The old leaderboard does not support downloading demos of the top 3 scores for each checkpoint like the new leaderboard does it is unclear whether other high scores in the old leaderboard would be valid. The old top score for floor 20 after defeating the Vampire (counterpart to Goblin King) is most obviously impossible since it requires having the equivalent of 66 yellow keys whereas the other high scores had the equivalent of at most 10 to 12 yellow keys.

[Last edited by greenscience at 02-01-2020 08:41 PM]

02-01-2020 at 08:29 PM

Quote Reply

	New Topic New Poll Post Reply
Caravel Forum : Caravel Boards : General : Pursuing the optimal solutions in DROD RPG	Surf To:

Forum Rules:
Can I post a new topic? No
Can I reply? No
Can I read? Yes
HTML Enabled? No
UBBC Enabled? Yes
Words Filter Enable? No

Sign in or Register	Lost your password?
Username:	Password:
	Remember me