+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 15 of 37
Like Tree40Likes

  Click here to go to the first Rift Team post in this thread.   Thread: Engineering Update

  1.   Click here to go to the next Rift Team post in this thread.   #1
    Rift Team
    Join Date
    Jun 2014
    Posts
    677

    Default Engineering Update

    Hey everyone!

    We wanted to stop in and let you all know a little more about what’s been going on over the last several days in RIFT. As you may know things are not working as smoothly as normal and I'd like to tell you all why, and what's being done to fix it. Recently we made an update to the way we handle guilds. Before 3.3 all guilds were loaded up by the server regardless of whether they were needed or not. This causes high memory usage on what we call the World Server. This server handles coordination between all our different servers, acting as a sort of Grand Central Station. The more memory it uses the worse off it is. With 3.3 we introduced a system internally we refer to as 'On-Demand Guild Loading'. The idea itself is simple: only load guilds we actually need to work with. Of course in practice it's harder, our servers assumed (rightly at the time) that all guilds were available all the time.

    In making a large game such as Rift different systems have different amounts of risk. This is the sort of change that does in fact come with a fair bit of risk. What makes it especially challenging however is that many of the problems that could happen won't show until something is live. Guilds are like that: Guilds are conglomerations of hundreds and thousands of players, and that is difficult if not impossible to fully simulate internally. Do not take this to mean we haven't tested this: We have tested this extensively: this code has been live internally for months.

    There's been a couple of different problems you all have seen and I'll go over the major server issues and what we're doing or have done about them:

    First: The extended downtime on patch day. It is possible for there to be errors in which a person appears, to the database, to be in two guilds. We had code in place to fix these states so that the new database could be clean of errors. This code unfortunately took quite a while to run and revealed a few problems, but nothing so bad that it required a new build. To prevent this problem in the future when we have major database altering changes like this we'll be taking extra steps to better simulate live data.

    Second: The crash on the US cluster after a short period of uptime: This particular issue was related to guild dimensions and on-demand guild loading. We rushed to get a fix out to prevent this issue and were successfully. It took a bit longer than we would have liked, and resulted in the extended downtime for the rest of the evening. As for preventing this one? It unfortunately slipped through and we can only aim to be more diligent in the future.

    Third: The rolling restarts: Once the servers began to stay up it was clear another problem was lurking: Memory leakage. For those of you not familiar with this term: All programs, games included require some amount of memory to run. When programs are done with that memory they are supposed to free it. When they don't that causes the program to use more and more memory and eventually run out, causing a crash. These are unfortunately very common problems in software. Thankfully since identifying the problem of memory leakage we’ve been able to fix the code causing this particular memory leak, and that'll go live as soon as possible.

    What you may not be aware of behind the scenes: During situations like this, where issues delay or prevent people from enjoying RIFT, numerous members of the dev team (ranging from management, engineers, design, and QA) spent many long hours working to identify and resolve these problems.

    We not only understand your frustration when Rift is having troubles. We share it, each and every time. Thanks again for your patience, we hope this sheds a little light on some of the challenges we’ve faced over the last couple days and what we’ve done to fix it.

    There is however some good news at the end of all this. With all these problems now handled it looks like On-Demand Guild Loading has done what it's set out to do, significantly improving our memory usage on the World Server.

  2. #2
    Rift Master Shedelin's Avatar
    Join Date
    Jun 2014
    Posts
    682

    Default

    Thanks for the update.

  3. #3
    Ascendant Seshatar's Avatar
    Join Date
    Jan 2012
    Posts
    2,244

    Default

    Does that mean less traffic costs or better performance?
    twitch.tv/seshatar | Twitter: @Seshatar | HoT Discord | Discord: Seshatar#1337

    <Skins on Farm> selling 700+ weapon wardrobe skins on [EU] (click for full list)

    IGNs: [EU] Volturnus@Gelidra / [NA] Seshatar@Faeblight | RAF: LMPNWLNLPHY2P6QCTWF6

  4.   Click here to go to the next Rift Team post in this thread.   #4
    Rift Team
    Join Date
    Jun 2014
    Posts
    677

    Default

    It means a general boost to performance yes. But the really big win is that we'll have more memory to work with. For a hypothetical case: Let's say we had previously discovered some optimization that involves our World server that would require 1 gig of memory. Previously memory was so tight we couldn't do that. With this change we almost certainly could (we'll have to wait to see how much memory is being used just before a downtime to say precisely how much we've gained, but I'm optimistic).

    Quote Originally Posted by Seshatar View Post
    Does that mean less traffic costs or better performance?

  5. #5
    Ascendant Seshatar's Avatar
    Join Date
    Jan 2012
    Posts
    2,244

    Default

    Ah okay, thanks for your fast reply! I'm looking forward to a bright optimization future then
    twitch.tv/seshatar | Twitter: @Seshatar | HoT Discord | Discord: Seshatar#1337

    <Skins on Farm> selling 700+ weapon wardrobe skins on [EU] (click for full list)

    IGNs: [EU] Volturnus@Gelidra / [NA] Seshatar@Faeblight | RAF: LMPNWLNLPHY2P6QCTWF6

  6. #6
    General of Telara
    Join Date
    Mar 2014
    Posts
    967

    Default

    Quote Originally Posted by Snedhepl View Post
    Memory leakage. For those of you not familiar with this term: All programs, games included require some amount of memory to run. When programs are done with that memory they are supposed to free it. When they don't that causes the program to use more and more memory and eventually run out, causing a crash.
    No offense, but this made me giggle...
    Isn't the Rift client itself a prime example of leaking memory like an old sieve? How could anyone of us not be familiar with it

  7. #7
    Shadowlander
    Join Date
    Apr 2011
    Posts
    48

    Default

    Thank you very much for posting this, Snedhepl. As an engineer myself on a live service, I can well appreciate that the past few days have been as un-fun for the Rift team as they have been for the players.

    I always enjoy hearing war stories from other engineering teams (usually shared over a pint, mind!), but as a player knowing what is going on, what went wrong and what's being done to fix it goes a long way to reducing the frustration levels. So virtual pints to you and your team, and I hope we're over the worst of the instability for 3.3.

  8. #8
    Ascendant Narcise's Avatar
    Join Date
    Jul 2014
    Posts
    2,468

    Default

    Thanks, Sned. Appreciate the info!
    People who circumscribe their lives to conform to the narrow expectations of others get exactly what they deserve.

  9.   Click here to go to the next Rift Team post in this thread.   #9
    Rift Team
    Join Date
    Jun 2014
    Posts
    677

    Default

    Quote Originally Posted by Dotcher View Post
    I always enjoy hearing war stories from other engineering teams
    I once worked in an office where my room's thermostat was controlled by the thermostat used for the server room. It took quite a bit of literal cardboard and literal duct tape to fix that issue.

    And some gloves.

  10. #10
    General of Telara
    Join Date
    Jan 2011
    Posts
    973

    Default

    Very informative and interesting post.
    It must have been pretty hectic for you guys these last few days. I appreciate all the hard work you guys are putting in. Thank you.

  11. #11
    General of Telara
    Join Date
    Mar 2014
    Posts
    967

    Default

    Quote Originally Posted by Snedhepl View Post
    I once worked in an office where my room's thermostat was controlled by the thermostat used for the server room. It took quite a bit of literal cardboard and literal duct tape to fix that issue.

    And some gloves.
    Holy *beep*...which genious architect came up with that?? I mean that's close to bodily harm (if I looked up the correct term)
    I did some server maintenance tasks in an internship, and I sure wouldn't want to work at server room temperatures all day, no matter which time of the year.

  12. #12
    Champion Kat Fantastic's Avatar
    Join Date
    Mar 2011
    Posts
    563

    Default

    Thanks for the information, Sendhepl. o:
    Thunderkat (65) - Tekay (65) - Pfifer (65) - Visionary (65) - Rhailo (65)


    <Bounty>'s GL - NA chillest.

  13.   Click here to go to the next Rift Team post in this thread.   #13
    Rift Team
    Join Date
    Jun 2014
    Posts
    677

    Default

    It was a smallish company and didn't own the building. Nor was it architected to have a real server room/climate control divide and there was enough hiring that caused the office to fill up. At the time we were preparing to move to new offices as well.

    Quote Originally Posted by Lynx3d View Post
    Holy *beep*...which genious architect came up with that?? I mean that's close to bodily harm (if I looked up the correct term)
    I did some server maintenance tasks in an internship, and I sure wouldn't want to work at server room temperatures all day, no matter which time of the year.

  14. #14
    Champion of Telara lynspottery's Avatar
    Join Date
    Jan 2011
    Location
    Snellville, GA USA
    Posts
    1,347

    Default

    Quote Originally Posted by Snedhepl View Post
    I once worked in an office where my room's thermostat was controlled by the thermostat used for the server room. It took quite a bit of literal cardboard and literal duct tape to fix that issue.

    And some gloves.
    lol... and were you forced to wear a wool scarf around your neck too? I have this image:

    Engineering Update-snowman.jpg

  15. #15
    Ascendant Nuuli's Avatar
    Join Date
    Jan 2012
    Posts
    2,004

    Wink

    Quote Originally Posted by Snedhepl View Post
    Hey everyone!


    <BIG SNIP!>

    What you may not be aware of behind the scenes: During situations like this, where issues delay or prevent people from enjoying RIFT, numerous members of the dev team (ranging from management, engineers, design, and QA) spent many long hours working to identify and resolve these problems.

    We not only understand your frustration when Rift is having troubles. We share it, each and every time. Thanks again for your patience, we hope this sheds a little light on some of the challenges we’ve faced over the last couple days and what we’ve done to fix it.

    There is however some good news at the end of all this. With all these problems now handled it looks like On-Demand Guild Loading has done what it's set out to do, significantly improving our memory usage on the World Server.
    Yadayadayada...

    Okay good job! I know that there wasn't any lollygagging going on. Glad to here that you've tackled that memory usage issue. I recall a very large guild here went to Guild Wars 2, at launch crashed it due to the largeness of the guild.

    I appreciate that this effort is also based on improving the handling of the game overall. Every little bit helps.

    Thanks for sharing!

    Okay Ocho? Please escort the engineer back into the dungeon and get back to work, tackling all those other issues on the board. LOL!

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts