Insights

Operational guidance for the moments that matter.

Field notes from cybersecurity, compliance, and infrastructure work. Each piece is short, specific, and grounded in what we have actually seen.

DNS sits underneath everything. Email, web traffic, SaaS authentication, VPN access, voice systems, payment processors. When DNS resolution fails, the systems above it don't return errors that say "DNS is broken." They return timeouts, certificate errors, generic 500s, and login loops. Frontline staff start opening tickets. Leadership starts asking what happened. Hours pass before someone identifies the actual cause.

The two failure modes that produce most outages

Across mid-market environments, two patterns account for the majority of DNS-related disruptions we see in incident response:

One — single-provider dependency. Many organizations run all authoritative DNS through a single registrar or provider. When that provider has a regional outage, a configuration error, or a denial-of-service incident, the entire DNS footprint goes dark. The fix is straightforward: authoritative DNS should be served from at least two independent providers with different infrastructure. Cost is modest. Operational benefit is substantial.

Two — recursive resolver fragility. Internal recursive resolvers, often running on the same hardware as Active Directory or other infrastructure, fail in ways that are difficult to detect until they cascade. A single overloaded resolver can degrade performance for an entire office. Health checks on resolvers, redundant paths to public resolvers, and clear failover procedures matter more than most internal IT teams realize.

What to actually do

  • Inventory authoritative DNS providers. Confirm at least two are in active use.
  • Test failover. Pick a non-business-hours window and disable the primary; confirm the secondary serves traffic correctly.
  • Document the resolver path on each network segment and validate that downstream caches refresh on schedule.
  • For organizations of any size with regulatory or operational sensitivity, add denial-of-service mitigation at the authoritative layer; many providers include this at no additional cost.
  • Treat DNS configuration changes with the same change-control discipline as firewall rules. The blast radius is comparable.

For organizations evaluating their current DNS posture, we are available for short architectural reviews focused specifically on resilience and failure-mode analysis.

Organizations approaching their first SOC 2 audit often spend months on policy authorship, framework selection, and tool procurement. By the time the auditor arrives, the policies look polished. The challenge is that auditors do not assess policies. They assess whether the controls described in the policies actually operate, with reproducible evidence.

The four areas auditors examine first

Access reviews. Quarterly access reviews are required. Auditors will ask for the review artifacts. They want to see who reviewed access, when, what they found, and what was changed as a result. A clean policy that says "access is reviewed quarterly" is insufficient if the evidence does not exist. Many first-time audit failures trace back to this single area.

Change management. Auditors will sample production changes and ask for the corresponding tickets, approvals, testing records, and post-change verification. Organizations that deploy through CI/CD should have approval workflows that produce evidence automatically. Organizations that deploy manually should have a documented process and ticket trails.

Vendor management. Every third-party service that touches customer data is in scope. Auditors will request evidence that vendors were assessed before onboarding, that contracts include appropriate security terms, and that ongoing monitoring exists. A spreadsheet listing vendors is a starting point, not an endpoint.

Incident response. Auditors will ask about incident history during the audit period and request evidence that incidents were handled according to policy. Organizations that have had no incidents still need to demonstrate that the response capability is exercised (typically through tabletop drills with documented outcomes).

What to do six months before the audit

  • Implement evidence collection automation. If a control depends on a quarterly task, the task should produce a dated artifact in a known location.
  • Run a self-assessment. Go through the trust services criteria as if you were the auditor. Note every gap.
  • Address gaps in priority order. Access reviews and change management first; the others next.
  • Engage a readiness assessor before the formal audit. The cost is well below the cost of audit findings, and the relationship continues into ongoing program work.

We provide SOC 2 readiness assessments, control gap analysis, and ongoing compliance program operations across Type 1 and Type 2 audit cycles.

A phishing simulation report typically arrives with a click rate ("12 percent of users clicked"), a report rate ("28 percent of users reported the message"), and a list of repeat clickers. Leadership reads these and asks whether the program is working. The honest answer requires looking past the headline numbers.

Click rate is a noisy signal

Click rate alone tells you very little. A 12 percent click rate on a well-crafted lure targeting a specific role looks materially different from a 12 percent click rate on a generic mass blast. Comparing click rates across quarters is meaningful only when the difficulty of the lures is held roughly constant — which most programs do not enforce.

The number that matters more than click rate is time-to-report. When a real phishing message lands, the operational question is not how many people clicked. It is how quickly the security team learns that an attack is in progress. A program where ten percent of users report a suspicious message within five minutes is worth considerably more than a program where forty percent of users will eventually report it sometime.

Repeat clickers are not a training problem

Most organizations with mature phishing programs have a small population of repeat clickers — often 2 to 5 percent of users who click on most simulations regardless of training intervention. The instinct is to assign more training to these users. The actual answer is usually compensating controls: stronger email gateway filtering for those accounts, conditional access policies that require additional verification for high-value actions, and reduced privileges where possible.

Repeat clickers are not a training problem because they are typically high-stress, high-volume employees in roles where they cannot reasonably read every message carefully. The program should adapt to them, not the other way around.

What a useful phishing report looks like

  • Click rate, segmented by lure difficulty and by role/department.
  • Median and 95th-percentile time-to-report for users who reported.
  • Compromise simulation: of users who clicked, how many entered credentials, and how many were caught by downstream controls (MFA, conditional access, EDR).
  • Repeat-clicker list with associated compensating-control recommendations rather than additional training assignments.
  • One specific operational change recommended for the next quarter, with the data that supports it.

We review and tune phishing programs as part of broader security operations engagements, focusing on what the data actually tells you about user behavior and where compensating controls produce more value than additional training.

The migration plan looks straightforward. Move users from the old tenant to the new tenant over a weekend. Update MX records. Send a confirmation email. Done.

The Monday-morning calls start when finance discovers that Bank of America is rejecting their wire-transfer notifications, sales discovers that outbound prospect emails are landing in spam at three of their largest accounts, and the CEO discovers that her response to the board chair didn't arrive.

The mailboxes moved correctly. The deliverability didn't.

The three layers of deliverability that often get missed

SPF, DKIM, and DMARC alignment. When you change the sending platform, the SPF record needs to authorize the new platform's IP ranges. DKIM keys need to be generated by the new platform and published in DNS. DMARC alignment depends on both being configured correctly. Many migrations leave SPF pointing partly at the old platform and partly at the new one, which causes intermittent failures.

IP reputation warm-up. If you are moving from a dedicated IP to a different dedicated IP, the new IP has no sending reputation. Major receivers (Google, Microsoft, Yahoo) will throttle new IPs until reputation is established. This takes 2-4 weeks of gradually increasing volume. Migrating high-volume senders without a warm-up plan produces deliverability cliffs.

BIMI and brand authentication. If your old system published a BIMI record (the brand-logo standard), the new system must continue to do so or your email will lose its authentication badge in Gmail and other clients. This is a small thing visually but matters for brand-sensitive senders.

The migration checklist that prevents most issues

  • Audit all current SPF, DKIM, and DMARC records before migration. Document the existing state.
  • Set DMARC to p=none with reporting enabled at least 30 days before migration. Monitor reports to identify legitimate senders that aren't yet authenticated.
  • Add the new platform to SPF and configure DKIM signing before cutover, in parallel with the old platform.
  • Migrate users in waves; monitor deliverability metrics for each wave before proceeding.
  • For high-volume senders, work with the new platform's deliverability team on a warm-up schedule.
  • Plan to leave the old platform's MX records in a fallback configuration for 30-60 days post-cutover. Cleanup is the last step, not the first.

We coordinate email migrations across Microsoft 365, Google Workspace, and hybrid environments, with deliverability and authentication treated as first-class deliverables rather than afterthoughts.

In incident response engagements, the initial access vector is frequently an account that no one was actively monitoring. A former contractor whose account was technically disabled but still had API tokens active. A service account with a password that hadn't rotated in three years. A domain admin granted to a sysadmin who left the company eighteen months earlier.

The defensive program that focuses only on the active and the new misses an entire class of risks that quietly accumulate.

Six specific items that often go missed

One — orphaned API tokens and personal access tokens. Disabling a user account does not necessarily revoke their tokens. GitHub, Slack, Microsoft Graph, AWS, and most major SaaS platforms issue tokens that survive account disablement unless explicitly revoked. Token inventory should be part of every offboarding checklist.

Two — service accounts with stale credentials. Service accounts are routinely exempted from password rotation policies because rotating them breaks integrations. Three years later, a service account has a password that was set when the previous IT director was still employed. Rotation needs to be planned, not avoided.

Three — privileged role membership that survives role changes. When someone moves from sysadmin to project manager, their domain admin membership often stays. Quarterly privileged-access reviews catch this, but only if the reviews are actually conducted and documented.

Four — guest accounts in tenant. Most Microsoft 365 and Google Workspace tenants accumulate guest accounts from external collaborations that ended years ago. Guest accounts often have access to SharePoint sites, Teams channels, and shared drives. Periodic guest cleanup is not in most operational runbooks.

Five — shadow MFA exemptions. Conditional access policies frequently include exemptions for specific users or groups, added during troubleshooting and never removed. A semi-annual review of all conditional access exemptions catches this.

Six — break-glass accounts that have been used outside emergencies. Break-glass accounts should be used only in emergencies and audited after each use. In practice, they often get used as convenience accounts when normal processes fail. Logs need to be reviewed regularly.

How to build the practice

  • Add token inventory to the offboarding checklist for every system that issues tokens.
  • Schedule quarterly service-account credential rotation, with planned coordination for any integrations that need updating.
  • Run quarterly privileged-access reviews with named reviewers and documented outcomes.
  • Conduct semi-annual conditional-access exemption reviews and tenant guest-account cleanup.
  • Audit break-glass account use monthly. Each use should have a documented justification.

We conduct IAM hygiene assessments as standalone engagements or as part of broader Zero Trust and identity federation programs.

Ready when you are

Engaged when complexity increases and accountability matters.

Begin a confidential conversation about your environment, your risks, and the next decision that needs to be right.

Start a confidential conversation