Update - The network issues in 50A-2109C colo and ALS sector 5 have been resolved.

The remaining network issue is the FlexLab firewall network.
Oct 15, 09:41 PDT
Update - Networking is restored to all locations except for:
* Some possibly degraded networking in 50A-2109C colo
* Beamline network in ALS sector 5
* FlexLab firewall network
LBLnet is working on on restoring full service to those locations.

Shared Drives (G, H, Grouper, etc) and Science Project Storage System (SSPS) are back online. Business Systems are operational. We expect HPC systems to be online later this afternoon.
Oct 14, 13:00 PDT
Update - We continue to monitor the situation.
Oct 12, 18:26 PDT
Monitoring - We continue to monitor the situation.
Oct 12, 18:23 PDT
Update - As of Saturday evening, electrical power and cooling is restored to computer rooms. Networking is restored to most locations. Shared Drives (G, H, Grouper, etc) and Science Project Storage System (SSPS) are back online.
Oct 12, 18:11 PDT
Update - At approximately 12:18 am PG&E power was shut off to the main lab campus, causing all network equipment not on battery to power off. In locations with generator power network coverage has since returned.

Networks at JGI, 971, 977 and 978 continue to operate normally.

Systems anticipated to be operational, including email, identity management, and core business systems, are currently operational.
Oct 10, 06:15 PDT
Investigating - What is the current and expected status of LBL systems?
Currently (10.9.19 at 8:25AM), all systems are operational except for those that were proactively shutdown (see below). Some LBL datacenters are expected to be able to operate on emergency power/generators during this outage, so key systems (see below) should be operational.

What systems are anticipated to be operational?
All Google Suite Services (gmail, docs, drive, calendar, etc) (For clarity, access to lbl email should function normally for all devices).
All Identity Management Services (login, phonebook, directory, etc)
Business Systems (FMS, HRIS, EHS Systems, etc)
Cloud Based Websites (Google Sites, WPengine, Pantheon, etc)
Www2.lbl.gov, status.lbl.gov, and Commons.lbl.gov
Core Networking
Telephones and Voicemail
Security Systems including Physical Access Control

What systems have been shut down?
LBL IT Provided HPC Services
Science Virtual Machine Services (SVM)
Science Project Storage Service (SPSS)
Shared Drives (G, H, Grouper, etc)

What systems are anticipated to go down when power is shutoff?
Networking in buildings without power
Access to any systems located in non-powered buildings

Why do you use the word “Anticipated”
Emergency power is unpredictable. If generators fail, are required to be shutdown, or run out of fuel - or if cooling is insufficient to safely manage systems, systems will be shutdown. Information will be posted
Oct 9, 10:15 PDT

About This Site

Current and scheduled IT outages are documented here. Labwide status information is available at status.lbl.gov

Collaboration Systems (Email, Google Suite, Zoom, Smartsheet, Passwords, etc) Operational
Google Apps Gmail Operational
Google Apps Docs Operational
Google Apps Sync for Mobile Operational
Google Apps Calendar Operational
Google Apps Hangouts Operational
Google Apps Sites Operational
Google Apps Sheets Operational
Atlassian Bitbucket Website Operational
Atlassian Bitbucket Mercurial via HTTPS Operational
Google Apps Slides Operational
Zoom Zoom Meetings Operational
Google Apps Talk Operational
Google Apps Groups Operational
Google Apps Drive Operational
Atlassian Bitbucket Git via HTTPS Operational
HelloSign / HelloFax HelloSign API Operational
HelloSign / HelloFax HelloSign Operational
Windows File Shares (G and H Drives) ? Operational
Networking Operational
Wired Network & Internet Connectivity ? Operational
Wireless Network ? Operational
Planned Network Maintenance ? Operational
Telephones & Cellular Operational
Telephones & Voicemail Operational
Cellular Repeater Services in LBL Facilities Operational
Science Systems Operational
High Performance Computing (Lawrencium) Operational
SVM - Science Virtual Machine Operational
SPSS - Science Project Storage System Operational
Audiovisual Services & Conference Room Support Operational
Business Systems Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Oct 17, 2019

No incidents reported today.

Oct 16, 2019
Completed - The scheduled maintenance has been completed.
Oct 16, 11:00 PDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 15, 11:00 PDT
Scheduled - Dear Lawrencium Users,

The lawrencium supercluster has been back online since early yesterday afternoon. We started bringing up our storage file systems and other infrastructure Sunday evening, the compute nodes were powered on Monday morning.

Thank you very much for your patience during the power outage. Please contact us at hpcshelp@lbl.gov if you have any questions.

-- HPCS User Services
Oct 15, 10:56 PDT
Oct 13, 2019

No incidents reported.

Oct 12, 2019
Completed - The scheduled maintenance has been completed.
Oct 12, 10:00 PDT
Update - Power is back! The power feeds were restored to the Lab this afternoon. LBNL Facilities will do high voltage switching tonight and then a build-by-building re-energization tomorrow. Our building is currently prioritized towards the end and datacenter is completely down so it'll be another 12-24 hours for our datacenter group to get the environment stabilized. Assuming everything goes well, we will start bring up the parallel and NAS filesystems and some infrastructure Sunday evening and then the compute nodes starting first thing Monday morning. Our plan is to have Lawrencium and other HPCS systems back online sometime Monday afternoon.
Oct 10, 22:45 PDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 9, 10:00 PDT
Update - We will be undergoing scheduled maintenance during this time.
Oct 9, 09:59 PDT
Scheduled - We are anticipating that PG&E power to the Lab will be shut off Wednesday morning (October 9) at 12:01am due to the Fire Weather Watch and likelihood of high winds.

Based on this information, all Lawrencium and all other clusters were shutdown at 10pm 10/8 in advance of the power outage. Note that this outage will affect access to the Lawrencium Supercluster, SVM (Scientific Virtual Machine) service, Software Modules Farm, and any systems in the 50B-1275 and 50A-2109 data centers.


We apologize for the inconvenience. Users can expect these systems to be back into production about 8 hrs after power is restored.
Oct 9, 09:56 PDT
Oct 11, 2019

No incidents reported.

Oct 8, 2019

No incidents reported.

Oct 7, 2019

No incidents reported.

Oct 6, 2019

No incidents reported.

Oct 5, 2019

No incidents reported.

Oct 4, 2019

No incidents reported.

Oct 3, 2019

No incidents reported.