Wednesday, November 3, 2010

Poison Mailbox Detection and Correction

One of the biggest pains I have faced many times in my career has been troubleshooting an Exchange server issue that was caused by a single mailbox or user. From an alert system flooding a mailbox, IMAP user with tons of folders to the lovely iPhone 4.0 release. They have all been problems from one to hurt many.

Well Exchange 2010 has added some preventative measures to stop those rogue users in there tracks. Poison mailbox detection is one of those new measures that is handled by the information store and will stop all access to a mailbox unless the OPEN_AS_ADMIN flag is passed.

Now the who, what, when and where:

A mailbox is considered poison when it is causing a crash/deadlock three times within two hours. The threshold for how many crashes lead to quarantining a mailbox as well as how long a mailbox should stay quarantined are configurable. You can modify the MailboxQuarantineCrashThreshold (Default 3 crashes) and MailboxQuarantineDurationInSeconds (Default six hours) in the following path:

 HKLM\SYSTEM\CurrentControlSet\Services\MSExchangeIS\<Server Name>\Private-{db guid}\QuarantinedMailboxes

There are two conditions that the store considers a "poison-able" offense.

  • if a thread that is doing work for that mailbox crashes

  • if there are more than 5 threads in that mailbox that have not made progress for a long time

  • That mailbox is then tagged, along with a count of how many times it has been tagged by a registry key in the following location:

     HKLM\SYSTEM\CurrentControlSet\Services\MSExchangeIS\<Server Name>\Private-{db guid}\QuarantinedMailboxes\{mailbox guid}

    You will see two keys CrashCount and LastCrashTime.

    *An event will also be created on the mailbox server with event id 10018, detailing the user and the time of the quarantine.

    During a database mount, the Exchange store will read the time which the mailboxes were identified as potential threats. If more than two hours has elapsed, the registry key for the mailbox will be wiped out. 

    After you have found the cause of the crashing by the user and rectified the problem you can reset the mailbox by deleting the registry key for the quarantined mailbox. Unfortunately you will need to either remount the database or restart the information store for the reset to take effect.

    UPDATE: I posted a new blog that provides a script to find users that have been quarantined. You can find that here.