Mitigating Azure MFA Outages

With the recent MFA outages (twice in the last two weeks), the question has come up in conversation several times – what do we do when MFA is down, and when our users can’t log in?

As you might expect, there are several approaches you can take that allow you to maintain the balance between security and allowing your users to keep working, even when the service is not functioning properly.

Option 1: Configure a global admin “break glass” account.

This is a great option, and endorsed by Microsoft as a best practice for overall tenant management and security – just keep a couple of things in mind:

  1. Make sure that this account is properly locked down and not tied to a specific individual. If you can lock it down as tight as Microsoft recommends (credentials kept separate, locked in a fireproof safe) then great – if not, at least make sure that you make the password suitably difficult, and keep these credentials on lockdown.
  2. Set this account to use your @domain.onmicrosoft.com account – if you’re using ADFS or PTA, and you’re unable to authenticate back on premises because of issues with your federated domain, setting it up this way will allow you to log back in to your tenant using Microsoft authentication, rather than your on prem infrastructure (which might be down as well).
  3. Exclude this admin account from MFA Policies. There are additional ways that you can lock down this emergency account, such as creating a Conditional Access Policy that ensures that this account can only be used from a domain joined PC, on your corporate network, etc. Just remember that you might still be painting yourself into a corner if the Conditional Access service goes down and you have no way of validating your access conditions. You always have to choose to balance between how much controls you put in place, and how much of a back door you want to leave behind – your security and risk posture is really going to be the deciding factor here.

Exclude your emergency account from MFA Policies

Let’s say for instance that I have my baseline policy enabled: Require MFA for admins (because I do). Automatically all my global admins are required to use MFA, and my security posture has gotten a bit stronger, along with my Secure Score.


However, since I want to make sure my emergency account is able to access the system even if MFA is down, I simply set an exclusion on the policy so that this account does not require MFA.


Like I mentioned before, you can configure other conditional access policies that might further limit this account – maybe you might want to ensure that it’s only logging on from a trusted location, trusted machine, etc. In this case, I simply have this account excluded from the policy, and I use my own security rigor to control access to this account.

Audit access to this account and configure alerts

Ensure that you are auditing this account’s access on a regular basis to ensure that it’s not being misused (or maybe even to ensure it’s not being used at all). If you have access to Cloud App Security (requires EM+S E5 licensing on your tenant), you can configure an activity policy that will alert specified people any time this account is used, like so:


Click the button to Edit and preview results in order to test your filters and ensure they’re working properly.


Once you are satisfied that your activity filters are catching the required account logon activities, configure an alert to send an alert email or text message:


Creating your emergency access / break glass admin account is going to cover a number of scenarios, and not just the MFA service being down. The point of having this account, of course, is that you can log in to Azure AD and disable your conditional access policy which requires MFA, and get users logging back in again.

Option 2: Create an MFA exclusion group

Another option that addresses the specific scenario of quickly and easily allowing your users to log in to Office 365 without requiring MFA is to create a security group on prem, sync it up to Azure AD, and set it as an exclusion on your MFA Policy.

For example, I’ve created two groups in Active Directory named MFA_Enabled and MFA_Disabled to keep things simple:


These two groups are used when I create my MFA Conditional Access Policy – one to target specific users for MFA, and the other for my MFA exclusion group.

Conditional Access Policy: Include Users and groups


Conditional Access Policy: Exclude users and groups


Once you have this set up, you can keep your MFA_Disabled group empty, and just have it ready in case something like this happens again. As soon as you get a report that MFA is down, and your users are impacted, you can simply move your affected users (or all of them if you prefer) to the MFA_Disabled group and run a quick sync in AAD Connect.

If you like, you can even use a simple script like this to move your users from the MFA Enabled to the Disabled group:

$mfaEnabled = Get-ADGroupMember -Identity “MFA_Enabled”

foreach ($m in $mfaEnabled){

Remove-ADGroupMember -Identity MFA_Enabled -Members $m.SamAccountName -Confirm:$false

Write-Host “Moving $($m.name) to the MFA_Disabled group” -ForegroundColor Yellow

Add-ADGroupMember -Identity MFA_Disabled -Members $m.SamAccountname

}

And then just do the reverse to bring them back in to the MFA Enabled group:

$mfaDisabled = Get-ADGroupMember -Identity “MFA_Disabled”

foreach ($u in $mfaUsers){

Remove-ADGroupMember -Identity “MFA_Disabled” -Members $u.SamAccountName -Confirm:$false

Write-Host “Moving $($u.name) to the MFA_Enabled group” -ForegroundColor Green

Add-ADGroupMember -Identity “MFA_Enabled” -Members $u.SamAccountname

}

Now obviously you need to be running these cmdlets from a system with access to your Active Directory, with admin rights to move users from one group to another. You can also make this part of a more complete script by adding logging, parameters to -enableMFA or -disableMFA, etc. These are just simple building blocks to get you started. 🙂

When you’re done, initiate a quick sync using the following command from your AAD Connect server:

Start-ADSyncSyncCycle -PolicyType Delta

Within a few minutes you can easily exclude some or all of your users from MFA, and just as quickly re-enable it for them, even if you can’t log into the portal yourself. Since this action is completely triggered from your server on prem, I believe it gives you an avenue to very quickly re-establish access should another MFA outage occur.

Hopefully this helps give you some ideas around how best to manage and control access to your Office 365 tenant, even when certain parts of it are outside of your control (looking at you, MFA!) – good luck!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.