Quote Originally Posted by Snedhepl View Post
Hey everyone!We wanted to stop in and let you all know a little more about what’s been going on over the last several days in RIFT. As you may know things are not working as smoothly as normal and I'd like to tell you all why, and what's being done to fix it. Recently we made an update to the way we handle guilds. Before 3.3 all guilds were loaded up by the server regardless of whether they were needed or not. This causes high memory usage on what we call the World Server. This server handles coordination between all our different servers, acting as a sort of Grand Central Station. The more memory it uses the worse off it is. With 3.3 we introduced a system internally we refer to as 'On-Demand Guild Loading'. The idea itself is simple: only load guilds we actually need to work with. Of course in practice it's harder, our servers assumed (rightly at the time) that all guilds were available all the time.In making a large game such as Rift different systems have different amounts of risk. This is the sort of change that does in fact come with a fair bit of risk. What makes it especially challenging however is that many of the problems that could happen won't show until something is live. Guilds are like that: Guilds are conglomerations of hundreds and thousands of players, and that is difficult if not impossible to fully simulate internally. Do not take this to mean we haven't tested this: We have tested this extensively: this code has been live internally for months.There's been a couple of different problems you all have seen and I'll go over the major server issues and what we're doing or have done about them:First: The extended downtime on patch day. It is possible for there to be errors in which a person appears, to the database, to be in two guilds. We had code in place to fix these states so that the new database could be clean of errors. This code unfortunately took quite a while to run and revealed a few problems, but nothing so bad that it required a new build. To prevent this problem in the future when we have major database altering changes like this we'll be taking extra steps to better simulate live data.Second: The crash on the US cluster after a short period of uptime: This particular issue was related to guild dimensions and on-demand guild loading. We rushed to get a fix out to prevent this issue and were successfully. It took a bit longer than we would have liked, and resulted in the extended downtime for the rest of the evening. As for preventing this one? It unfortunately slipped through and we can only aim to be more diligent in the future.Third: The rolling restarts: Once the servers began to stay up it was clear another problem was lurking: Memory leakage. For those of you not familiar with this term: All programs, games included require some amount of memory to run. When programs are done with that memory they are supposed to free it. When they don't that causes the program to use more and more memory and eventually run out, causing a crash. These are unfortunately very common problems in software. Thankfully since identifying the problem of memory leakage we’ve been able to fix the code causing this particular memory leak, and that'll go live as soon as possible.What you may not be aware of behind the scenes: During situations like this, where issues delay or prevent people from enjoying RIFT, numerous members of the dev team (ranging from management, engineers, design, and QA) spent many long hours working to identify and resolve these problems.We not only understand your frustration when Rift is having troubles. We share it, each and every time. Thanks again for your patience, we hope this sheds a little light on some of the challenges we’ve faced over the last couple days and what we’ve done to fix it.There is however some good news at the end of all this. With all these problems now handled it looks like On-Demand Guild Loading has done what it's set out to do, significantly improving our memory usage on the World Server.
Jump to post...