Ceph datastore settings explanation (question)

Hi!

We have an up and running ceph luminous cluster of two pools. One is HDD based, one is flash only.

I read these docs:
https://docs.opennebula.org/5.4/deployment/open_cloud_storage_setup/ceph_ds.html
https://opennebula.org/installation-of-ha-opennebula-on-centos-7-with-ceph-as-a-datastore-and-ipoib-as-backend-network/

Questions:

  1. What is the purpose of using CephFS (second link)? I am experienced in Ceph deployment / usage with multiple small to medium clusters. If I understand the docs correctly, this is used for swap and temporary files. We planned to use local storage (144G SAS RAID1) in OpenNebula Compute Nodes, anything wrong with that? Does this lack any important feature? What about (live) migration between nodes?
    During my tests with CephFS and a folder mounted to 20 or more clients, cache pressure was a raising problem. I would like to set aside CephFS.

  2. What is meant by “BRIDGE_LIST”? Are these hosts which clone an image from datastore C (NFS) to datastore A (ceph rbd)? We do not plan to deploy anything other than ceph based rbd’s.

  3. Do I need to create one system DS and one image DS per pool (4x DS)?

Any info / help on ceph DS would be appreciated.
Thank you very much!

Kinds regards,
Kevin

Hello,

  1. BRIDGE_LIST is for frontend to be able reach your datastore(s). For ex, in my case I have three compute nodes (node1 node2 node3) and one frontend (engine).

Nodes have shared storage like you. I have FC, you have CephFS. In bridgelist I have node1 node2 node3 value and Frontend have in /etc/hosts records about node1…node3 ip addresses. Frontend login via ssh oneadmin user to that nodes and monitor datastores. It is also used for other datastores operations.

  1. In my case I use images DS for persistent images and system ds for non-persistent images and deployment files. So I think that you can have just one system ds and many image ds. Depends how you want use it.

Hi!

Thanks for your feedback!

Just to check if I unterstood:
BRIDGE_LIST is a list of nodes, capable of using the specified Ceph pools.
Example: I got two nodes (A + B) connected to ceph and one that is not ©. During HA / Failover only nodes A + B will be chosen, because they are listed in BRIDGE_LIST.

Is this corrent?

Ceph is primarly used as rbd store while CephFS is a full POSIX filesystem. While CephFS is perfect for large files from a small number of clients, it is nearly unusable on more than 20 clients. The cluster I am planning will have more than 50 compute nodes. I would like to avoid CephFS if possible.

Can someone share details of a larger installation?

After reading the docs over and over again, it seems like the system files (swap) etc. will be created on the system DS which itself can also be ceph rbd…

Hello, bridge list is just for monitoring and operations on datastore. When there is action (monitor, image create, copy, etc…) them oned randomly select one of nodes in bridge list and do operation on it. It has nothing with HA. If you want VMs deployed in HA mode, them you have to add compute nodes to same cluster and setup host error hook.

Ok, sorry, knew that but the example was not perfect. But thanks for the hint!