+ Reply to Thread
Results 1 to 2 of 2
Like Tree1Likes
  • 1 Post By Baanano

Thread: 1.9 Watchdog alternatives

  1. #1
    Plane Touched
    Join Date
    Feb 2012
    Posts
    228

    Default 1.9 Watchdog alternatives

    This post from Zorba announces a critical breaking change in the addon system.

    I don't know in what stage of development it is right now, but it'll probably be added this year and our addons will have to deal with it. Even though it seems necessary and useful to keep the game running smoothly, and most of us won't care as it won't break our addons (because we are such great programmers and won't hit the limitation), it has the potential to break any addon in unexpected ways. My main concern is that it can stop execution at any arbitrary point, probably leaving the addon in an inconsistent state that would result in unexpected behaviour and a constant error dump to the player chat.

    I'm sure Zorba will come with the best solution to this problem. However, I find unlikely its changed after it has been released and things start breaking for no apparent reason, so this thread aims to collect ideas on how to make the performance watchdog less intrusive before it goes live. Even if only the lesser of them is helpful to Zorba to make it less restrictive, it will be worthwhile. We're constantly discussing even the most trivial things, so please contribute any idea you might have on this topic.

    Please don't reply by saying you can use coroutines and optimize your code. We haven't any control on which addons a player will have installed, and even only one of them failing could break havoc in the addon environment affecting the others no matter how carefully they have been coded.
    Last edited by Baanano; 04-13-2012 at 03:25 PM.

  2. #2
    Plane Touched
    Join Date
    Feb 2012
    Posts
    228

    Default

    In this post I'll try to explain my idea on a clean alternative to maintain addons state consistent even though one of them fails as result of the performance watchdog stopping it.

    I'll start with my assumptions on how the addon system works and how it will work when the watchdog is implemented. After that I'll post an alternative solution and further refinements on it. Each section will build over the former, so if any assumption is wrong, or any proposal impossible to implement, further sections could be wrong too.

    1.- Current addon system

    Right now the addons are executed in the same thread that the game and executed intertwined with UI code.

    They can take almost any resource they need freely and consume almost as much CPU as they want. But sometimes, some of them start consuming too much, lowering the framerate or even crashing the client.

    2.- The evil watchdog

    As the UI is more important than the addon system, the performance watchdog will be introduced to monitor addons activity. Those addons that consume too much will be warned and, if they continue degrading performance, they'll be stopped in an arbitrary point of their execution.

    That is, this watchdog first barks and then bites. But when it bites it doesn't care who it's biting to nor whether it's biting your shoes or ripping your head off. So, in this new environment, addons are forced to behave well or face mutilation.

    My assumptions in how the watchdog works:

    - When the UI thread begins updating a frame, the watchdog is set to execute in another thread and put to sleep for some time before kicking in.

    - If the frame update ends before the watchdog awakes, everything is ok and both the UI and the addon system are happy.

    - But if the watchdog awakes while the UI thread is still running, it'll terminate the currently running addon code via LuaError and let the UI continue its execution.

    The problem here isn't that the addon code errors out, but that it can be terminated in the middle of a critical section, destroying the addon state in an unrecoverable way.

    3.- Friendly watchdog

    The first priority would be to modify the watchdog so, even if addon code is terminated, it isn't left in an inconsistent state. That is, have it bite but don't cripple our addons.

    To accomplish this, the frame update procedure could be changed this way:

    1) Every time a frame finishes rendering (after Event.System.Update.End), save the [consistent] Lua state.

    2) Prepare the watchdog to awake once the addon time limit is reached.

    3) Execute the addon system code until the next frame is ready to be rendered (after Event.System.Update.Begin) or the watchdog awakes. Every time a system event is fired (which could transfer execution to addon code), track how much time each addon identifier consumes.

    If the time limit wasn't hit go back to #1. If the watchdog awakes before the frame update is complete, stop addon code execution and proceed to #4.

    4) Identify the addon that has had the highest CPU consumption during #3, then restore the Lua state (saved in #1), reschedule the watchdog (#2) and restart the addon system execution (#3), but this time don't call any event hook registered by the offender addon. Repeat this "disabling" addons until the process finish without the watchdog kicking in and a consistent Lua state can be saved (#1).

    5) When the addon system execution begins for the next frame (#3), "re-enable" any addon disabled during the last frame and warn them through an event so they know they were skipped (losing any event fired during it) and need to lower their consumption. This way, they can choose to turn off optional features (graceful degradation).

    6) If any addon keeps hitting the time cap for some frames, show the player a dialog so he can decide if he wants to turn off that addon: "[Addon Name] is degrading game performance. Would you want to disable it? {Yes|No}".

    If he chooses to keep it enabled, don't show any further dialog during the rest of the session, as the player made the informed choice to keep the addon running even though it's consuming too much.

    If he chooses to disable the addon, don't call any further event hook registered by the addon during the rest of the session.

    While the dialog is being shown, keep the addon running. The game will have a terrible framerate, but this would help the player to understand the consequences of keeping it enabled.

    As this dialog could sometimes be too intrusive for the player experience during combat, you could just temporary disable the addon without player consent (or better, add an interface config option to know how the player wants the game to behave if this happens, being "Temporarily disable addons that consume too much CPU during combat" the default setting). A message could be shown in the chat panel informing the addon has been temporarily disabled. Once the player gets out of combat, either show the dialog or just re-enable the addon.

    4.- Problems that this solution may create

    1) Cloning the full Lua state to save it each time a frame finishes rendering can take a lot of time, degrading performance.

    I haven't though how to solve it if this happens, but we could try to find a solution.

    2) Commands (Command.*) that are fired by addons before the watchdog rollbacks the Lua state and disables them could affect game state without the addon knowing it has issued them.

    To prevent this, any command could be queued and executed only when the frame update procedure has been completed successfully (after Event.System.Update.End). If the watchdog rollbacks the Lua state this queue would be cleared and those commands discarded.

    3) Addons could try to bypass this simple disabling approach by hooking API calls instead of events. This would be considered rogue behaviour and the community should be able to identify those addons and warn other players to turn those addons off or uninstall them.

    That wouldn't address disabled addon code being called by other addons. However, I think there are only two ways to do that:

    a) Using Utility.Dispatch to credit the other addon the execution time. This API call should fail if the addon identifier has been disabled.

    b) Calling the disabled addon functions directly. This way the execution time would be credited to the calling addon, so any addon doing this is taking full responsability, and should be disabled by the watchdog if it exceeds the time limit as a result of these function calls.

    4) Disabled addons memory is leaked as it can't be reclaimed by the garbage collector

    5.- Conclusion

    If something similar to this can be implemented, addons wouldn't be left in an inconsistent state by the watchdog.

    The framerate would probably drop whenever the watchdog needs to rollback the Lua state, but players would have a way to make an informed choice on either accepting the performance hit the addon is causing or disable the addon. I find this better than having the addons behave in unexpected ways or dumping errors to the chat panel.

    As of us addon authors, we would need to have in mind our addons could lose some events if we try to consume too much CPU, but I personally find that better than having unexpected errors during execution.

    We'd have to optimize our high time consuming code sections anyway, but at least we wouldn't need to face player's bug reports that make no sense (how can my code have failed in this line? it's just a multiplication!).

    Probably I'm totally wrong with my assumptions or there are things I haven't considered that would invalidate this approach. However, from my point of view, not knowing that something is impossible is the key to make it possible.

    Hope you find anything useful in this wall of text, and please comment back any hole you find in it or any idea on how to improve it.

+ Reply to Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts