Toyota Shut Down 14 Factories Due to 'Insufficient Disk Space'

Toyota Shut Down 14 Factories Due to 'Insufficient Disk Space'

Even major car manufacturers can never have too much storage space. In late August, Toyota had to shut down 28 assembly lines at 14 auto plants in Japan due to computer issues. Today, Toyota provided more details on the shutdown, including that "an error occurred due to insufficient disk space."
The robots log everything, line by line command by command, what was sent to them, what they executed, and what the sensors in them recorded as actually being executed.
For those familiar with 3D printers, think of a logging server that sits between an octoprint server that logs the commands sent from octoprint, the commands received by the printer, then the logs from all the sensors on the motors to ensure the movements there match the commands from the octoprint server.

Generates a crapload of logs and they need to keep them in the event anything gets investigated because if there’s a problem you better believe they want to make sure it wasn’t something at that level.

So it takes up a lot of space very quickly and when it can’t log they stop.

Had the same thing happen to a GM
Plant back in the early 2000’s. The RS232 controller wasn’t properly signaling the robots and they assumed it wasn’t recorded. Not a fun night.
 
"The company further wrote that the servers were running on the same system as its backup, causing the same issue there, so the company couldn't make a switch."

This strikes me as a failure of engineering right there, and I'd be curious as to exactly what went wrong so everyone else can avoid following that example.

Since it's a matter of running out of disk space, if I had to guess, they made the mistake of not implementing a proper cap/quota on shared storage (probably some kind of NAS or SAN for the servers) that also happened to host their backups, even worse if they don't have a redundant SAN and they just set up an "inverted pyramid of doom" infrastructure for their servers.

Backups ought to be possible even with the systems running live, especially if they had a lick of IT sense and virtualized all their servers for ease of migration to any physical host.
 
"The company further wrote that the servers were running on the same system as its backup, causing the same issue there, so the company couldn't make a switch."

This strikes me as a failure of engineering right there, and I'd be curious as to exactly what went wrong so everyone else can avoid following that example.

Since it's a matter of running out of disk space, if I had to guess, they made the mistake of not implementing a proper cap/quota on shared storage (probably some kind of NAS or SAN for the servers) that also happened to host their backups, even worse if they don't have a redundant SAN and they just set up an "inverted pyramid of doom" infrastructure for their servers.

Backups ought to be possible even with the systems running live, especially if they had a lick of IT sense and virtualized all their servers for ease of migration to any physical host.
It screams we got sold on a Hyperconverged Infrastructure and we didn't modify our policies accordingly.
I've seen it, maybe done it...
But if you configure your policies wrong it's not hard to run out the drives on an HCI stack.
 
If this is Japan, I picture an errant Fax Machine jammed up and a whirring 5 1/4" floppy drive waiting for a carrier tone to connect, totally crippling the entire production facility..............................
 
hah.. dealt with something similar last week. At work our image backup solution (cough Acronis) decided to hang while writing a log file and ballooned it to over 2TB. Luckily I caught it though before the server ran out of space haha.
 
1694119063089.png
 
Inventory. Probably a 15 year old AIX system still barely kicking attached to SAN and either it rebooted and they couldn't remap LUNs or SAN itself had an issue. I've seen this a time or two in the auto industry for various reasons. It's probably been on their roadmap/to-do list to replace for last 10 years and just never get around to it.
 
I joke but if they built a server stack in a standard HCI config, where they were running a 2 node fail over with their primary and backup VM’s on the stack with no external functioning for quorum. They would, 1 have far less storage space than they realized, and 2, they would be replicating that data in in multiples across the stack. It’s a don’t do this 101 scenario.

Classic new servers old way of managing them failure.

It’s made even worse if they have multiple sites replicating across to each other for data redundancy.

Those poor HDD’s burnt to a crisp before their time.
 
Back
Top