Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The proposed approach is to implement a scalable lease mechanism without delaying online rebuild by proceeding as follows:

  • The management service regularly issues an iv broadcast across all the engines to push a new lease deadline

  • If a server passes the lease deadline, it tries to read the iv variable storing the latest deadline from its parent. If the parent does not reply, it tries to read from the iv root.

  • A node is allowed to serve I/Os until the latest lease deadline + lease time.

  • Upon exclusion, the node is marked as down in the pool map immediately and rebuild can proceed. That being said, new writes issued from client nodes to objects impacted by the rebuild should be delayed until the lease time has passed.

The following changes are thus required:

  • IV trees are currently created per pool. A new IV tree for the system will thus be required. This tree can be used to propagate system attributes

  • A new ULT must be started in the mgmt service to regularly issue iv broadcast to update the deadline.

  • The logic to try to refresh the least when the lease deadline has passed must be implemented on the engine.

  • The engine should return a special error code to the client once the lease deadline + lease time has passed. Client should try to refresh the pool map when getting such an error code and resubmit the RPC.

  • Surviving engines must delay write processing until the lease time has passed after exclusion.

User Interface

How is the user/admin expected to interact with the new feature? Describe the API/tool.
What are all the tunables provided to the user/admin?
Any extra statistics that should reported to the user/admin?
Explain how errors will be handled.

...