Find Treasure App — Ileya Game

What we learnt before, during and after

On Thursday, July 30th, 2020, we created the twitter account for FindTreasure.app, and made its first tweet:

What followed was an interesting 24 hour journey of making sure the game was a success.

If you’re not familiar with how FindTreasure.app works, I suggest you check it out, and play the Test game at least before continuing.

During this time, we learnt a few things about User Behavior and Developer mistakes we think are worth sharing.

1. Every loop hole will be exploited

While building, we considered that a Contestant can create multiple accounts, and run parallel instances of their script, with different JWTs obtained when creating those accounts, giving themselves an advantage over others.

This was a loop hole we were willing to leave open, because “Let’s see how long before Devs figure it out”.

It appears the answer to, “how long?” is “Not long at all”.

The first 20 signups had emails that were variations of john.doe@example.com, such as:

Developer John Doe had obviously figured out our little secret.

We see you, John Doe!

After the game was over, someone(s) registered 600+ accounts with the pattern, “haha+HASH@example.com”, e.g. haha+HS1D59DAR91DADZW@example.com .

What we learnt

This is why Email verification and filtering exists.

We will be removing all suspicious emails, adding filters to remove disposable email addresses, and enforcing email verification before accounts can participate in new games.

2. Devs will make mistakes

An hour into the game on Friday, and tensions were high. The first 90+% of the first 500 nodes had been covered of the 3000 nodes in the Maze, but no one was moving past the first 500 nodes.

It took a few minutes to realize why this was happening. You see, we (read as one of us) made a fix the day before, where we generated the graph that makes up the Maze in batches of 500, and joined them together, because it was much faster than generating the entire thing at once.

The problem was we hadn’t done a good job of joining the batches together, so they were disconnected. Users had zero chance of getting past the first 500.

We were all on a Hangouts call while the game was happening, so we discussed this, and resolved to push a new commit to master during the game, which would fix the problem, while the game was on.

Thankfully, we’d painstakingly setup a great CI/CD pipeline, so this deployment went smoothly. The server restarted with the fix, Contestants didn’t even notice (at least, we hope not), and soon, they began hitting new nodes.

What we learnt

Things will go wrong, but we must make sure we can respond and fix them quickly when they do.

3. Users will make mistakes

At the end of the game, we had NGN 9,000 left, after users had found all 50 treasures. This meant that 9 times, one or more Contestants had supplied a wrong account number in their request.

To combat this, we added 4 treasures towards the end of the game, as we thought there were 4 mistakes, instead of the actual 9 there were.

We had a Contestant reach out to us on Twitter, complaining about this, but we had made no plans for resolving a situation like this during the game.

What we learnt

We have two options here:

  • Make it clear that user mistakes will not be resolved
  • Follow up and resolve user mistakes after the game

4. Rate Limiting saves server lives

The first time we had an In-House game, one of us wrote some code that took advantage of Promise.all(…) to call all returned endpoints. It was so fast, someone watching described it as scary, and we all agreed to implement rate-limiting as a way of preventing this form of traversal.

To explain how rate per minute (RPM) was decided, we’ll give you sneak peek at how the Maze works.

Every node endpoint has a weight, a random number between 3 and 10, and this is useful in delaying its response, so the Rate per Minute (RPM) of a game is calculated as 60 / nodeWeightAverage. This is what informs the 429 status codes you get when your code becomes too eager.

This meant that our server could not be bombarded by too many requests from the same account, and this played a part in keeping it accessible to every contestant and spectator.

5. Race Conditions are a menace

A few days before launch, one of us had found a potentially devastating bug.

When multiple runners were applied on the Test game without rate limiting, they would hit the treasure node at the same time, and ALL GET PAID.

This was huge!!

It was as a result of a race condition, which is common in async flows like we were dealing with.

We discussed potential solutions, and someone came up with a good mitigation plan to shuffle the endpoints returned in the response, so that when combined with the delay and rate limiting, should drastically reduce the chances of this happening.

As far as we know, the race condition actually happened only once during the game, as the game ended with 55 treasures found out of 54.

55 of 54 treasures found

What we learnt

Whatever can possibly go wrong, will go wrong. Because band-aids like this are only for a short period, we will be fixing this problem before the next iteration of the game.

If you are looking forward to the next game, follow @FindTreasureApi on twitter, and turn on notifications. There will be a game announcement soon.

I’ve learned I don’t know anything. I've also learned that people will pay for what I know. Maybe that's why they never pay.