Berkeley Lab IT Status


(Resolved) Saturday, November 7 2020 ~4:00PM

Update 9:30 PM:

The 50A-2109C scientific colo facility is back online and available for system owners to start their servers back up. We believe that all other network systems are also functioning normally at this time.

Update 7:50 PM:

Power has been restored to the impacted buildings. We believe that network and other communications have returned to normal operation. The 50A-2109C scientific colo facility remains offline while Facilities troubleshoots cooling in Building 50A.

IT will continue to work to verify that all systems are operating normally and bring the remaining systems online as soon as cooling is available.


Original notice at 4:30 PM:

Unanticipated power outages as a result of generator testing work for the site have caused the shutdown of numerous IT systems. IT staff are onsite shutting down systems in order to manage heat load in the IT Datacenters. No ETA for restoration.

Details:

50, 50A, 50B, 50C, 50E, 50F, 54, 55, 55A, 56, 56A, 56M, 60, 63, 64, 65, 65A, 65B, 70, 70A, 71, 71A, 71B, 71 Trailers, 88, 91, 91U, 90 Complex

At 16:00:05 this afternoon power was lost to the above buildings. Network services in all of those listed buildings

experience disruptions, although some limited network fed by standby

generators has come back online since that time. Other network survived

the initial outage on battery backup but is failing as the batteries are

exhausted.


IT has no ETA for restoration of power at this time. When we have

information about power restoration and network service restoration we

will update this announcement.


IT has shut down systems in 2109 and 2275 to manage heat and power load while on generator backup. Core business systems, cyber systems, and productivity systems are not impacted. HPCS systems, scientific colo in 2109, and numerous other systems are offline.


(Resolved) Thursday, October 15, 2020 12:30 PM - Commons

We are currently experiencing issues loading Commons pages. Our team is currently working on resolving the issue.

(Resolved) Monday, September 14, 2020 - MFA Issues

Users may experience some issues when using MFA to login to services (Gmail, Calendar, LETS, FMS, etc.). We are currently investigating these issues.

(Resolved) Sunday, September 13, 2020 - MFA Issues

A brief MFA outage began at approximately 4:30 PM today. The issue was resolved by 6 PM.

(Completed) Sarturday, September 12, 2020 - Multi Factor Authentication Upgrade


On Saturday, September 12, 2020, between 10 PM and 12 AM Sunday, IT will be performing an upgrade of the servers that handle Multi Factor Authentication (MFA) for the Lab.

While we expect limited to no interruption to the MFA service during this period, there is a potential for outages lasting as long as 30 minutes. If such an outage occurs, it may impact logging into services that require MFA, including the Lab's Web single sign-on service at login.lbl.gov, FMS, and IT's HPC clusters.

During the upgrade period, users will not be able to create or modify existing tokens. Should you use a Yubikey during this period to login and later experience problems, we recommend that you login to the MFA Management application at https://identity.lbl.gov/mfa and resync the affected token.

If you experience any other issues with your MFA tokens or Yubikey, please contact the IT Help Desk at help@lbl.gov.

(Resolved) August 24, 2020 - Zoom Outage

Zoom is reporting outages that affect meetings and webinars. For further information, please visit https://status.zoom.us/.

Alternatively, you may use Google Meet to host your meetings while Zoom resolves the issue. For more information about Google Meet, click here. To start or join a Meet meeting, visit https://meet.google.com/.

(Resolved) August 16, 2020 - Commons Outage

Updated 8/16/20 6PM

As of 6PM, service has been restored to commons.lbl.gov

Starting on Sunday, August 16 at around 10:45 AM, commons.lbl.gov, the Lab's Confluence server, was offline for about 7 hours due to an unscheduled outage.

(Resolved) August 6, 2020 - Bitbucket Repository Write Issues

Updated 8/6/20 5:45PM

The read-only issues with the Lab's Bitbucket repositories have been resolved as of 5:35PM, so repositories are writeable again.

The issues were first reported at 3:40PM (Pacific).

(Completed) August 5, 2020 - Scheduled Commons Outage

Updated 8/5/2020 10PM

On Wednesday, August 5 starting at 6PM, commons.lbl.gov, the Lab's Confluence server, was offline for about 3.5 hours during a scheduled maintenance window.

(Completed) July 10, 2020 - Scheduled Email Change

Updated 7/10/2020 5PM

On Friday, July 10 at about 4PM, the lbl.gov MX record changed from mailgate.lbl.gov to Google's MX record values: https://support.google.com/a/answer/174125

This change was made to protect the Lab's critical email communications in the event of another PSPS, hardware and/or software failures in the email pipeline, or any other outages in the LBL infrastructure because, in most instances, email will no longer flow through the Hill before being delivered.

More detailed documentation about this email migration is available at: https://commons.lbl.gov/x/QhD5Cg

Please contact emailteam@lbl.gov if you notice any related issues after SMTP traffic starts shifting directly to Google.

March 18, 2020 - Brief Power Outage

Updated 3/18/2020 10:12AM

There was a brief power outage affecting the Lab at approximately 9:00AM today.

Power has been restored, but not all desktop computers automatically reboot when power has been lost. For many people, this has likely resulted in the inability to connect to your Lab desktop computer since it is now powered off. If you need assistance getting your Lab desktop computer powered back on, please contact the IT help desk.

IT Provided HPC Services are still in the process of recovering from this outage.

March 2020 - COVID Curtailment

Updated 3/15/2020

IT is currently operating normally.

Most services are anticipated to continue in normal operation during the curtailment.

Beginning on Friday March 20, deskside support (workstation staffing troubleshooting while present at your workstation) will be curtailed, with most support being provided remotely.


November 20, 2019 - PSPS Outage

Updated 11/19/2019 2:11PM

The PSPS Outage has been cancelled. No systems will be proactively shutdown at LBL. Normal operations anticipated.

November 19, 2019 - Network Outage - Resolved

Resolved. Prior: IT is investigating a large-scale network outage at the Laboratory that began around 10:20AM. No ETA available for restoration.

AT&T In-Building Mobile Phone Outage 04 November 2019

Updated 11/02/2019 12:44PM

AT&T in-building mobile phone service on the main LBL Hill Campus will be offline Monday, 04 November, 2019 from 06:00AM to 02:00PM for hardware maintenance by AT&T to repair and/or replace failed components.

Questions? Email tsc@lbl.gov

PG&E Public Safety Power Outage 25 October 2019

Updated 10/29/2019 4:16PM

All IT services including HPC have returned to normal operation.




PRESERVED OUT OF DATE INFORMATION

Updated 10/29/2019 at 11:28AM

IT is engaged in the re-energization of the Laboratory. We anticipate that all services except HPC will be available by the time the Laboratory is re-opened. HPC services require full cooling capacity in Building 50, which generally occurs 24 hours after re-energization.

At this time, IT service availability is generally the same as during the last PSPS Outage (though this is subject to change).

What systems are operational?

  • All Google Suite Services (gmail, docs, drive, calendar, etc) (For clarity, access to lbl email should function normally for all devices).

  • All Identity Management Services (login, phonebook, directory, etc)

  • Business Systems (FMS, HRIS, EHS Systems, etc)

  • Cloud Based Websites (Google Sites, WPengine, Pantheon, etc)

  • Www2.lbl.gov, status.lbl.gov, and Commons.lbl.gov

  • Core Networking and VPN

  • Telephones and Voicemail

  • Security Systems including Physical Access Control (where controllers have power)

  • Science Project Storage Service (SPSS) (Operational, but can be shut off at any time due to heat load issues)

  • Shared Drives (G, H, Grouper, etc) (Operational, but can be shut off at any time due to heat load issues)

  • The Helpdesk is operational at help.lbl.gov and available to provide assistance to offsite facilities and telecommuting employees.

  • All IT service teams are either fully operating remotely or on call (e.g. networking, telephony, security, collaboration, etc).

What systems were shut down?

  • LBL IT Provided HPC Services (Shut Down as of 10/25 COB)

  • Science Virtual Machine Services (SVM) (Shut down as of 10/26 9AM)

  • Systems in the Colo Facility (2109) (Operational for All Customers (were previously shut down as of 10/26 2PM)


Why do you use the word “Anticipated”

Emergency power is unpredictable. If generators fail, are required to be shutdown, or run out of fuel - or if cooling is insufficient to safely manage systems, systems will be shutdown.


Questions? Email it-communications@lbl.gov



In the event of any disruption of IT services, this site will provide a means for Lab employees to communicate with each other and stay informed on the status of IT's operations.

The Emergency Status Information Line is: 1-800-445-5830.