Multiple products impacted by search failures

Incident Report for Customer Service Management

Postmortem

Summary

On April 8, 2026, between 04:46 UTC and 12:09 UTC, search functionality was unavailable or degraded across several Atlassian Cloud products, including Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration.

A configuration change increased the resources reserved for a core system component that runs on nodes in our compute platform. On a subset of clusters configured for high‑density workloads, the increased reservations exceeded available node capacity interrupting search and related experiences for affected customers.

The root cause was identified and a rollback was merged at 05:42 UTC with some systems seeing recovery by 07:33 UTC. Core search functionality was restored approximately by 08:55 UTC, and full downstream recovery completed by 12:09 UTC.

IMPACT

During the impact period, some customers experienced outages or degradation in search across Jira, Confluence, Jira Service Management, Rovo, Rovo Dev, Loom, Guard Standard, Customer Service Management and Atlassian Administration. Other experiences that rely on search such as quick find, navigation, AI assistants, dashboards, were also intermittently affected during this period.

Impacted customers may have been unable to find pages or recordings and experienced degraded performance in finding issues; received empty or delayed search results; or experienced AI assistants and dashboards that could not retrieve relevant context.

Jira, Jira Service Management and Customer Service Management: Search and experiences that depend on search like finding issues and agent responses in CSM remained available but with degraded performance in fallback mode. By 12:09 UTC, search indexes and search performance was fully restored from fallback to full capacity across all regions.

Guard Standard and Atlassian Administration: Search functionality was unavailable for parts of the incident window. As a result, Domain Claims, usage tracking, and managed accounts were degraded for portions of the window. These services were restored to operational status by 07:33 UTC. Guard Premium was not impacted by this issue.

Confluence: Search functionality was unavailable for parts of the incident window. Recovery began at 07:30 UTC as backend search clusters were restored. Full recovery, including search index replay, completed at 11:37 UTC.

Loom: Search functionality and some experiences that rely on Confluence Search, such as sharing to spaces) was unavailable for portions of the window and fully restored at 11:37 UTC.

Rovo and Rovo Dev: Rovo agents remained responsive but experienced degraded functionality due to loss of search capabilities in underlying services. They were unable to reliably return context about work items or pages. Functionality was fully restored at 11:37 UTC.

ROOT CAUSE

Atlassian products rely on OpenSearch clusters to power their search capabilities including issue search, content search, and AI-powered search features.

An infrastructure configuration change increased resource reservations (CPU & Memory) for a system component that runs across our compute platform. On a subset of clusters configured for high-density workloads, the increased reservations exceeded available node capacity. This caused search workloads to be evicted and, in some clusters, could not reschedule onto any available nodes impacting search functionality across affected products.

The change was deployed across multiple production clusters in a short time frame, limiting the opportunity to detect the capacity conflict in a smaller subset of clusters before it reached the wider fleet. Automated scaling systems attempted to recover by provisioning additional capacity but in the worst‑affected clusters this led to runaway node scaling and exhaustion of available network resources, prolonging recovery time.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We understand that service disruptions impact your productivity. In addition to our existing testing and preventative processes, Atlassian is prioritizing the following actions to help reduce the likelihood and impact of similar incidents in the future and to speed up recovery when issues occur:

  • Enforce smaller deployment cohorts and larger soak for critical platform changes for these cluster types
    Implement smaller deployment cohorts, mandatory soak periods between environments, and automated health gates so that changes are validated on a limited set of clusters before being promoted more broadly.
  • Strengthen automated pre‑deploy validation for resource changes
    Add validation checks to ensure resource changes for system components are compatible with node capacity and reserved headroom, preventing system workloads from crowding out customer workloads.
  • Improve post‑deploy verification and alerting
    Enhance monitoring and post‑deployment verification to detect patterns such as spikes in pending pods, runaway node scaling, and low pod‑IP headroom closely correlated with new configuration being rolled out.
  • Align autoscaling behavior with capacity and safety limits
    Align autoscaling capacity calculations with node reservations and introduce safeguards and circuit breakers to prevent runaway scaling and to enforce safe limits on node and pod IP counts.
  • Enhance recovery automation
    Improve automation and runbooks so we can safely disable autoscaling, remove empty nodes in bulk, and restore normal operations faster across multiple clusters in parallel.

We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability and to reduce the risk and impact of similar issues in future.

Thanks,

Atlassian Customer Support

Posted Apr 17, 2026 - 16:23 UTC

Resolved

The issue has now been resolved, and the service is operating normally for all affected customers.
Posted Apr 08, 2026 - 11:43 UTC

Monitoring

The issue has been resolved, and services are now operating normally. Some customers may still experience delays when searching for data changed within the last hour, while new data continues to be indexed. We'll continue to monitor closely to confirm stability.
Posted Apr 08, 2026 - 09:03 UTC

Update

We are also aware of impact as a result of these search failures to Customer Service Management, Asset Reports, Jira Service Management and dashboards within Focus.
Our team have identified the potential root cause of the issue and our teams are working with urgency on a resolution.
We will provide further update within 1 hour.
Posted Apr 08, 2026 - 09:03 UTC

Update

We have restored core search functionality and services are operating again, but some customers may still experience delays when searching for data changed within the last hour while new data continues to be indexed. We are actively working to complete reindexing and will update this page as performance fully recovers.
Posted Apr 08, 2026 - 08:33 UTC

Update

We are also aware of impact as a result of these search failures to Customer Service Management, Asset Reports, Jira Service Management and dashboards within Focus.
Our team have identified the potential root cause of the issue and our teams are working with urgency on a resolution.
We will provide further update within 1 hour.
Posted Apr 08, 2026 - 08:33 UTC

Update

Confluence and Jira search reliability is improving, and we expect full recovery shortly; we will continue to closely monitor the services to ensure they remain stable.
Posted Apr 08, 2026 - 07:40 UTC

Update

We are also aware of impact as a result of these search failures to Customer Service Management, Asset Reports, Jira Service Management and dashboards within Focus.
Our team have identified the potential root cause of the issue and our teams are working with urgency on a resolution.
We will provide further update within 1 hour.
Posted Apr 08, 2026 - 07:40 UTC

Identified

We are also aware of impact as a result of these search failures to Customer Service Management, Asset Reports, Jira Service Management and dashboards within Focus.
Our team have identified the potential root cause of the issue and our teams are working with urgency on a resolution.
We will provide further update within 1 hour.
Posted Apr 08, 2026 - 06:25 UTC

Update

The impact of this incident is now understood to also be impacting search in Jira and Confluence, as well as additional downstream impacts to Rovo Chat, User Management, Administration and Guard.

Our team is continuing to investigate with urgency and we will provide further update within 1 hour.
Posted Apr 08, 2026 - 06:25 UTC

Update

Our team is aware that users are currently experiencing errors while attempting to search in Confluence and Jira. Our team is investigating with urgency and will be providing an update within 1 hour.
Posted Apr 08, 2026 - 06:25 UTC

Investigating

Our team is aware that users are currently experiencing errors while attempting to search in Confluence and Jira. Our team is investigating with urgency and will be providing an update within 1 hour.
Posted Apr 08, 2026 - 06:24 UTC
This incident affected: Customer experience, Customer Service Management AI agent, Support website, Email request, Customer Service Management spaces, Customer profiles, and Automation for Jira.