Starting August 30th, 2023 for Public Status Pages that allow SMS subscriptions StatusCast will now require that a valid email address be confirmed before a person can fully establish a new SMS subscription.
This change in subscription workflow is to help prevent malicious parties from attempting to commit SMS fraud which has become a growing concern for many SaaS companies dealing with mass notifications. We here at StatusCast have witnessed this trend, in the past 6 months the quantity of malicious traffic attempting to commit SMS fraud has increased drastically. While we have continued to implement industry best practices to safeguard against this sort of activity, ultimately real user confirmation is the most effective way to prevent such unwanted attention.
StatusCast's engineers were alerted that schedule maintenance events created from StatusCast's legacy application("V2") were not properly auto-closing after their estimated duration had been reached. After an initial investigation engineers have confirmed the cause on the service responsible and a patch was performed to correct the error. Any maintenance that was overdue for closure should have been resolved and StatusCast's engineers will continue to monitor the legacy process for this to ensure no other issues occur.
StatusCast’s engineers were alerted that some SMS and email notifications had an unusual delay in their delivery. The team immediately began investigating and determined that one of the notification processors was experiencing a failure that resulted in a queued backup of notifications. This did not impact all notifications sent through StatusCast, only a subset that were assigned to the instance in question.
Once the instance was isolated the backup of requests were offloaded and the instance itself taken out of rotation. Even though the instance itself did not enter an unhealthy state our engineers have re-evaluated certain health checks to account for queue backups as well as other performance attributes. Additionally, our engineers have scaled out the number of instances to help reduce the load on any single processor at a given time. At this time all notification services are running as expected but we will continue to monitor the system closely.
StatusCast’s engineers were alerted that some incident notifications were either slowly being delivered or appeared to not get delivered at all. After an initial review our engineers determined that the service processing notifications was experiencing performance issues, resulting in a queued backlog of notifications.
At this time StatusCast’s engineers have scaled out this service to allow the backlog to clear itself up in as an efficient manner as possible. Notification processing at this is has returned to it’s normal state and we will continue to monitor this closely. A root cause analysis will be posted when more information has become available.
At this time notification services are performing normally.
Summary of impact: Between 1:06PM and 1:59PM EDT on 29 June 2020, some customers may have experienced latency with incident notifications getting delivered. All notification services were recovered by 1:59PM EDT.
Preliminary root cause: Engineers identified the underlying root cause as a server delegation change affecting DNS resolution and resulting in a backlog of notifications getting queued. This issue impacted a subset of StatusCast’s customers who were delegated to the server in question. Availability to status pages and the administrative portal remained at 100% throughout the incident
Mitigation: To mitigate, engineers corrected the server delegation issue. To expedite the processing of the server’s backlog, engineers scaled out the service to efficiently distribute the backlog of incidents. Once the backlog was cleared the service remained at its normal operating state.
Moving Forward: StatusCast is committed to providing its customers a highly reliable and available service. Anytime an issue is reported that potentially affects availability of the status page or the integrity of notification delivery, we treat it with the utmost urgency. Moving forward we have established new monitoring protocols for our notification system to ensure that latency created backlogs are properly reported to our engineering staff. We have also taken this as an opportunity to evaluate the current scale of this service and how we can improve upon the functionality.
By continuing to use our services on or after March 25, 2020, you indicate your agreement with each of the updated and new terms and policies that are effective as of that date. If you have any questions or concerns, please email firstname.lastname@example.org or reply to this email.
The StatusCast team will be performing a maintenance on Saturday July 27th 2019 at 7:00 AM EDT, the estimated duration is 1 hour. We do not expect any impact to your service but in some cases there may be a brief interruption.
Please be aware that StatusCast is currently experiencing intermittent connectivity issues which is causing notifications to be delayed or possibly not sent. These are caused by a global network infrastructure issue at Microsoft Azure, StatusCast's hosting provider. Microsoft is currently investigating the issue.
At this time the application appears to be running normally, however until Microsoft deems the issue as fully resolved we will continue to monitor the application. If you experience an issue sending notifications please contact our support team at email@example.com. We apologize for any inconvenience experienced by these intermittent connectivity issues.
StatusCast's services continue to remain operational. We will continue to monitor the system closely as long as Microsoft's incident remains active.
Microsoft has confirmed that the issue has been mitigated and connectivity to all services should have returned to a normal state.
A summary from Microsoft regarding this issue is below:
Network Connectivity - DNS Resolution
Summary of impact: Between 19:43 and 22:35 UTC on 02 May 2019, customers may have experienced intermittent connectivity issues with Azure and other Microsoft services (including M365, Dynamics, DevOps, etc). Most services were recovered by 21:30 UTC with the remaining recovered by 22:35 UTC.
Preliminary root cause: Engineers identified the underlying root cause as a nameserver delegation change affecting DNS resolution and resulting in downstream impact to Compute, Storage, App Service, AAD, and SQL Database services. During the migration of a legacy DNS system to Azure DNS, some domains for Microsoft services were incorrectly updated. No customer DNS records were impacted during this incident, and the availability of Azure DNS remained at 100% throughout the incident. The problem impacted only records for Microsoft services.
Mitigation: To mitigate, engineers corrected the nameserver delegation issue. Applications and services that accessed the incorrectly configured domains may have cached the incorrect information, leading to a longer restoration time until their cached information expired.
For more information from Microsoft, please visit their status page at https://azure.microsoft.com/en-us/status/history/
The StatusCast team will be performing a maintenance on Sunday April 28th 2019 at 7:00 AM EDT, the estimated duration is 1 hour. We do not expect any impact to your service but in some cases there may be a brief interruption.
The StatusCast team will be performing a maintenance on Sunday April 7th 2019 at 7:00 AM EDT, the estimated duration is 1 hour. We do not expect any impact to your service but in some cases there may be a brief interruption.
The StatusCast team will be performing a maintenance on Sunday March 31st 2019 at 7:00 AM EDT, the estimated duration is 1 hour. We do not expect any impact to your service but in some cases there may be a brief interruption.