Backup fails with RequestTimeTooSkewed error (Google Cloud Storage)

The backup operation fails with the message:

The difference between the request time and the server’s time is too large.

The VMs are synced by Google Public NTP (default Google Cloud time synchronization).

I’ve tried both the S3 compatible and GCS syntax.

Perhaps this is useful:

The other way you can avoid this error is to use OAuth2 instead of HMAC for authentication (OAuth2 does not require clock synchronization the way HMAC does).
Source: cloud - Error on upload: "The difference between the request time and the server time is too large" - Stack Overflow

736022559921 2020-12-29 22:54:32.731   INFO: BACKUP DATABASE laravel
736022559981 2020-12-29 22:54:32.731   INFO: Kicking off a distributed backup for database `laravel` to Google Cloud Storage Target 'laravel'
736023611816 2020-12-29 22:54:33.783 DETAIL: Calling backup with: backup/backup --storage-type gcs --validate-for-backup --target **********/laravel.backup/laravel
736023789126 2020-12-29 22:54:33.960 DETAIL: Calling backup with: backup/backup --storage-type gcs  --target **********/laravel.backup/laravel.manifest
736023792479 2020-12-29 22:54:33.964   INFO: Thread 115054: TakeDatabaseSnapshotHelper: Starting snapshot for db `laravel`.
736023792523 2020-12-29 22:54:33.964   INFO: Thread 115054: BeginSnapshot: `laravel` log: Taking snapshot `laravel`, LSN 0x400000b00, term 0x10, version 0x158, prevEpoch 0.
736023890657 2020-12-29 22:54:34.062 DETAIL: Calling backup with: backup/backup --storage-type gcs  --target **********/laravel.backup/laravel
736024186907 2020-12-29 22:54:34.358   INFO: Thread 115054: TakeDatabaseSnapshotHelper: Snapshot for db `laravel` succeeded.
737104965662 2020-12-29 23:12:35.137 DETAIL: Calling backup subprocess with: backup/backup --storage-type gcs --target **********/laravel.backup/BACKUP_INCOMPLETE --get-size --expect-not-found
737105097244 2020-12-29 23:12:35.268 DETAIL: Calling backup with: backup/backup --storage-type gcs  --target **********/laravel.backup/BACKUP_INCOMPLETE
737105384674 2020-12-29 23:12:35.556  ERROR: Failed taking a distributed backup for database `laravel` to directory 'laravel' failed with (2205:Leaf Error (memsql-leaf-1:3306): Backup subprocess nonzero exit value.  The difference between the request time and the server's time is too large.)

Was backup working prior to the 7.3 upgrade?

The only recommendation I can make would be to resync the server time/restart NTP.

@nhoran I thought so, but actually, the backup history reveals the issue started on the 21st December. The upgrade was made 26th December.

The cluster ran out of allocated memory on the 21st preventing new queries from being executed. After freeing some memory and restarting the cluster I thought everything was operational again.

Can you imagine any reason why backups wouldn’t work after a “soft crash”?

I just followed your suggestion and reloaded the NTP service on all cluster nodes. A new backup is running now. I’ll let you know the result.

@nhoran The backup also failed after restarting the NTP services.

I just attempted to create a new empty database and created a backup. It succeeded in 2 seconds.

The size of the last succeeded backup was ~400 GB. The failure always happens after ~15 minutes. Data is written to GCS before failing.

Could the issue be related to the process running too long resulting in the GCS access token expiring? Even though the last succeeded backup completed in ~25 minutes.

The size of the last succeeded backup is ~380 GB reported by the GCS browser in console.cloud.google.com. The size of the failing backups is ~370 GB.

Some data was deleted between the succeeded and failing backups, so it might seem like the backup is actually completed successfully but perhaps failing writing some last metadata? Just a thought.

The 15 minutes failures just seem suspicious:

Furthermore, the client timestamp included with an authenticated request must be within 15 minutes of the Amazon S3 system time when the request is received. If not, the request will fail with the RequestTimeTooSkewed error code.

Let me know if I can help to further troubleshoot in any way. :slight_smile:

Just a quick update on this post.

We made a fix in the 7.5 release that addressed this problem. We are also looking at backporting the fix into an upcoming 7.3 patch release.

-Adam