So - as I got a bit further about my VM plans I come to a point where I have to solve how to access my storage managed by leap host from within my windows VM.
I planed to use BtrFS - and I was able to mount a BtrFS within windows using WinBtrFS - but as I plan to use 8 drives in a raid6 like setup after a long argue I ended up using ZFS with its RAID-Z2 which seem to fit my needs better. As I’m using ZFS on Linux there’s also a windows-implementation ZFS on Windows (but it seem to be a lower version so it doesn’t work when the pool is created with leap - but when it’s created with windows it also works on leap) - but I only tested it so far using two different VMs accessing the same vhd files - not running a leap on bare-metal and qemu running a windows guest.
As I searched for this topic on google it seems that many use SMB shares - as qemu “passthrough” only works with other linux guests but not windows. I also thought about passing through the drives or even the controller to the client - but then I remembered why I want to switch to linux on bare metal: to finally get rid of this windows-only proprietary **** …
So, the simple question: Is using SMB the “best” option for access a zpool on the host by a windows client - or is it just the most common used one?
Why do I ask this: As I’m a windows’ kid since late 90s I know there’s lots of software which just doesn’t play well when using a mounted network share instead of a local direct attached drive - for what ever reason (just as an example: as modern games still use rather questionable DRM which still have some kernel-level **** init (someone remember SecuROM?) they just not work when using a network share without any useful logs).
iSCSI seem to be in the middle as it’s mounted as local drive although it’s running over network - but iSCSI also require exclusive access - so I wouldn’T be able to mount the same LUN twice at the same time, although same is true for the zfs pool.
The goal should be to be able to have access to the filesystem from both the host as well as the vm at the same time.
Anyone has any suggestions?
a) it doesn’t matter to use sshfs-win or vbox “internal” share - they both end up in a network share - exactly the same I end up with just using SMB in the first place - so they do have the exact same issue with applications that don’t play well with network shares - in addition to the additional overhead they both come with
b) about vbox: the whole point of using kvm qemu is to passthrough my main gpu - virtualbox isn’t able to support such kind of passthrough - so vbox is out of the game
c) why using a vm in the first place: to run linux on bare metal and its capabilities when it comes to use more than one physical drive as one logical volume - yes, windows 10 does support this “windows storages”, which, when fiddle with powershell, does have something raid6-ish alike - but read up about all the issue with that **** - as with vbox: out of the game as well
c.2) using ZFSonWindows - yea, maybe an option, but it’s a few versions behind master - and also is pretty ugly to use as it somewhat “emulates” a local drive but uses the network share api (when you open the properties it’s like working on a network share - that’s even worse as one can’T set ANY file attributes at all) - also: when you mount a zfs volume windows keeps telling me this “the system config changed - please reboot” - but there’S no way to config ZoW in a way it auto-mounts pools and volumes on boot
So, yea, thanks for the input I guess - but they don’T fit my needs.
Network Share - Uses network share protocols like SMB.
Mount remote network filesystem - NFS, SSHFS
You also have block level distributed storage like iscsi.
And, if you are either very large or have plans to be very large, ceph.
The above are network based solutions which are flexible and work in a variety of situations both virtual and physical.
They may be affected by network conditions if you’re on a busy network. On a home network, you’ll probably not see any performance issues.
Then you have “Shared Folders” which is implemented by a non-network protocol that is restricted to Guests and the HostOS they are running on. Do not be misled that you usually access a Shared Folder in Windows as a networked object, that’s just a convenient metaphor that Users understand… If you sniff your network wire, you won’t see any traffic associated with Shared Folders.
If you use kvm-qemu, I wrote the following awhile back for setting up Shared Folders and only recently I’ve heard maybe needs some adjustment
SMB is known to be a chatty, less performant protocol. If you like SMB features like discovery and organization, go ahead, else other protocols might be considered.
If you prefer or are fluent in ZFS, go ahead else BTRFS clusters probably has a lower bar for learning… but be aware that there is a BTRFS parity bug deploying RAID that use a parity bit (like RAID 5 and 6) in large disk arrays (>4). Instead, you are advised to deploy BTRFS RAID which is mirroring for large disk arrays https://en.opensuse.org/User:Tsu2/systemd-1#BTRFS_RAID
I must misunderstand something, your reference for using ZFSonWindows is confusing… Isn’t Windows your Guest and openSUSE your HostOS? I wasn’t aware that ZFS clusters had a native remote access that is anything more than using common remote access like ssh, sftp, etc.
Thank you for your additional input. May let me address your questions:
My final plan is to run opensuse as host os on the bare metal and run a windows vm using kvm/qemu. Reason for it: Use the linux host to handle my multiple physical drives and provide it as one big volume while my powerful main gpu is passed through to the windows vm so I can use it for gaming. Why I just don’t use linux for gaming: Because I have some favourite games which use DRM technologies which only works on windows - along with DirectX only variants, so no OpenGL or vulkan which would run natively on linux. As windows just doesn’t offer any reliable solutions for raid itself, and even implementations like WinBtrFS or ZFSonWindows are limitted by windows itself, it’s just no option for me.
My current setup, using the fakeraid of my sb950/fx990 asus crosshair v formula-z, limits me to win7 only, as the raid-driver is only available for win7. Also: when the board dies I would require another compatible one. Using software based solutions would get me away from that bottleneck.
In addition to the mentioned limitation I already learned that typical hardware raid can’t handle “silent” bit errors and can lead to data corruption or loss when the controller things the parity is wrong while the actual error comes from a corrupted data block - so the correct parity which could be used to restore the correct data gets overridden with a corrupted one. BtrFS and ZFS provide additional protection against such errors.
Why I would like to use ZFS rather than BtrFS is a personal choice based on tests I did, as I wasn’t able to find objective information which would advise one over the other based on my needs.
So as I plan to use linux to manage the storage I somehow have to make it accessible to the vm so I can access it from within the windows vm. My first idea was to use iSCSI, but from my tests I was only able to mount it only once at the same time, but not on both the host and the vm at the same time. Hence I’m looking for a way how to make the data accessible to both systems at the same time.
One of the reasons is to be able access files from both systems like some work I have to do which can be done on opensuse so I don’t have to boot up the windows vm.
SMB sure is an option, but there’re some applications and games which, for whatever reason, doesn’t play well with network shares but only local attached drives. To avoid some issues I thought there’s a way to make some folder of the host available to the guest, but this seem to be limited to linux guests.
Virtualbox has a way one can create “pointer” files which can be added like disk images but actually access physical drives - but as vbox doesn’t provide gpu passthrough it’s no option.
So my qemu setup is similar, I just use physical drives for the qemu machines as well as a gpu. In the WinX Pro system, I just use sshfs (or net use) it then shows up as a network drive and accessible on the host and guest. Zero configuration required to connect, since ssh is running on the host.
The other thing I do, if no qemu system is running is unbind the second controller and then have access to the qemu disks if needed, but mainly to access the backup drive I have on the controller. Partitions I don’t want to see are hidden with a udev rule.
No matter what method you use to grant access to a filesystem by multiple devices, there will always be a possible contention problem (ie two machines access the file, make different changes and attempt to save their changes), but each method might address the issue more efficiently, automatically or transparently to the User.
Your iscsi issue doesn’t make sense. It’s fundamentally a distributed network storage so multiple machines have access to the same storage pool. Sounds to me you might have set up two iscsi targets instead of a target and any number of initiators. Once setup, all hosts should be able to read/write to the storage.
Be sure you have the right hardware setup before you attempt a GPU passthrough… In particular, you have to have multiple GPUs because any kind of hardware passthrough involves granting a Guest exclusive access to that device removing access from the HostOS.
Regarding BTRFS, I’m not aware that there is any special parity bit autorepair, but there is a feature similar to what you describe for ordinary data storage.
The Shared Folders I describe can also work and IMO supports your stated requirements, and it’s not limited to Linux guests… As I described, the shared folders are typically accessed in Windows Guests as networking objects, but they aren’t actually accessed on the network wire…It all happens internally on the HostOS.
I’m aware that accessing one file by two or more processes at the same time can and often does lead to data loss or corruption when both write to it.
It’s maybe me not finding the right words to explain what I mean by “access from both at the same time”. Let me try it this way: If you have a NAS you can access it from all machines within the network at the same time. Of course: If you have open a text file and write to it from multiple machines only the last write will be what’s left. Any other writes will be lost. Or, it could actually happen that two writes happen at the same time and the file becomes corrupted in a way that it contains partial data from both writes. So I’m fully aware of that. What I’d like to end up with is one logical volume, but instead of it being provided external via network from a NAS it’s directly attached to the host os itself, in this case the then current opensuse version.
It’s quite possible I made a mistake on the setup, but I only had one LUN and tried to mount it from both at the same time. The system I connected first got full access to the LUN, but the second system wasn’t able to connect at all. As for how I set it up: I used the iscsi-server yast2 plugin as I’m familiar with it at least so I’m able to create a LUN able to be accessed. Maybe I made some mistake which prevented simultaneous access, I’m not sure.
I do have a 2nd gpu to run the host os on it so my main gpu is actually free to be passed through. Currently there’s an issue with the board/cpu I own, but a friend of mine maybe has some spare parts that possibly work with my plans - but as I wasn’t able to test it yet I can’t tell. Maybe I still have to invest in newer hardware, maybe the parts of my buddy do the trick.
Well, at least as far as I understand BtrFS and ZFS from what I read so far both provide raid-ish alike combination of multiple partitions or even whole drives to work together as one logical volume - so basically what any raid does. But: “normal” raid suffers from the lack of detection of single bit errors to determine what the actual cause of the error is. This can lead to data corruption or loss when a hardware raid controller or a software raid implementation decides to rather go for recalculating the parity instead of correcting a corrupted data block. Both BtrFS and ZFS seem to use additional checksums to detect if an error occurd due to a faulty parity or if a datablock failed and can successfully recover from it. Why I chose ZFS over BtrFS: In my tests, and from information on the BtrFS mailing list, it’s still not ready to use in a RAID6 configuration. Also BtrFS has three types of storage blocks: the actual data blocks, meta-data blocks, and what’s called “system data” blocks. I was advised to use raid6 for the data blocks, but what’s known as raid1c3 for the meta-data - I just really didn’T got it and the version of opensuse 15.2 is just way to old to support it. I also had quite some issues recover from a failed drive (simulated by restarting a vm just with one of the drives detached or replaced). ZFS on the other hand looks to me way more robust and stable - and is easier to work with even after a simulated drive swap. As I wasn’T able to find any objective information which would say: “hey, use btrfs with option X” I just chose ZFS as it’s more user-friendly to me and doesn’t have an as steep learing curve than BtrFS. - so: personal choice
That I’m also aware of. It’s not about how the actual implementation works under the hood - but that there’s windows software which just doesn’t play nice with net shares. Windows, for whatever stupid reason, in fact does differentiate between several types of differnt kinds of drives: network shares, removeable storage, local attached drives - the list goes on for another few types. My very main issue: As the gaming industry decided to come up with anti-piracy stuff which is rather invasiv there’re some of these “protections” which, for whatever reason, just does not work when they detect the path to be a unc network share instead of a local drive, they just don’T, and as they don’T want to provide any useful information to hackers they often just doesn’t create any logs at all - and those which does create logs only spams it with useless **** which pretty much says: “F* you - I refuse to work”. So, it doesn’t matter if I use an actual external NAS which does send data over the wire - or some local only stuff which just uses network protocols. The main issue is that windows classifies it as a not local attached storage type - which triggers some of those anti-piracy protections which in turn end up in “the game just doesn’T even start at all” - those famous “it doesn’T work for no reason” … when in fact it’s an over-aggressive anti-piracy approach which only allows to run from storage that windows classifies as “local attached harddrive”. The only thing that comes close to that is when using iPXE and it’s sanboot module. This way iPXE “emulates” a san mounted LUN share as a local hard drive so windows sees it as such and those stupid anti-piracy DRM **** doesn’T trigger but let the game start and run. It’s for that one and only but very specific reason I didn’t build an actual NAS but use my onboard fakeraid drive: cause game industry suxx …
For sure I would like to move over to use Linux - but it’s the same problem: many anti-piracy stuff just doesn’T run on linux - and for those few games which do they often limited to DirectX and don’t have an opengl or vulkan port - so they can’t run due to using a not supported graphics api.
If it wasn’T for the reason I’m a gamer in my late 20s stuck to windows cause I like to play games which doesn’T work on network shares or linux for the mentioned issues I would had switched over about a decade ago when I first moved out into my first own appartment - but, as a “gamer” you’re stuck to what the industry provides you with. Yes, there’re stuff like this project from Valve/Steam to make windows-only games running native on linux - but unfortunately this only covers just a few of my favourite games. This is the reason why I still have to use windows - and as I don’T want to run it as host os on the bare metal anymore I have to come with a way to use the power of my gpu inside the vm - the only way this works is with pci-e passthrough via kvm/qemu - which has the downside that I now have to come up with a solution how to make use of my 12TB raid from both the linux host os but also from within the windows VM. Using a network share seems the easy way to go - if it would work with those stupid DRM anti-piracy stuff - but that’s the next wall I ran into face forward.
You see: Thanks for all your advices and explanation, although I do know most of it, but most of what you suggest just doesn’t fit my needs for one or more specific reasons caused by some jerk very high up of same game dev studio which just had the idea this morning: “hey, my car broke down - let’S make some money so I can buy a new one - and pay it CASH”. For me it would had been so much better if about 10 - 20 years ago microsoft would had lost its monopoly and the world would run on linux - then linux gaming would be big and big triple aaa titles like gta v would run on it. But history went differently …
So, have I tried your suggestions: yes. Did they work: well, basically: yes, they in fact do work, but not for what I would like to use them for. I still need some way of “mount” a directory on the host os as a “local attached drive” in the windows vm, but using qemu this only works with image files - not with an actual sub-tree of the host.
So what did you try? I can mount a partition/directory on the host as a shared drive with net use and sshfs?
To mount /dev/sda3 stuff on the host, I use;
net use X: \\sshfs\<username>@xxx.xxx.xxx.xxx\..\..\stuff
The only way is a separate controller and physical drives (this is what I do), at that point no access from the host, which is not what your after. Or, you do go that way and have another location for the transient data you want to transfer?
Again I have to repeat myself: It doesn’t matter what protocol I use - smb, sshfs, virtual box share or any other virtual driver which allows windows to mount filesystems other than the native supported ones - it always ends up in a mounted network share. As I already explained: Exactly that fact, that those shares are mounted as “non local” storage, triggers some of the DRM anti-piracy protections. So if I try to run such game from such a mounted share it either fails to authenticate - or doesn’t start at all.
Sure, another possible idea would be to add an image based drive and store that image on the zpool - but this would limit me to only be able to access it while the vm is shut down. Also it would get formatted with NTFS - which, correct me if I’m wrong, never got to a point where it’s fully reliable as it’s based on reverse engineering instead of public specifications. So, if I would have work to do which could be done using linux and doesn’t the windows vm to be fired up I have to rely on the ntfs-3g fuse driver doesn’t corrupt the image qemu is using for provide the logical volume to the vm.
TLDR: Yes, using a net share, no matter if plain smb or sshfs (which has the ssh overhead), does work in respect to simple file access - but can cause issues with specific applications - games in particular. It’s not like I’m looking for another solution cause the already mentioned ones doesn’t work at all - but as I have a specific problem caused by some rather narrow-minded idiots who still think that DRM and anti-piracy **** which even fails for legitimate copies is the right way to go and don’t give a **** about any other system than windows and hence keep using DirectX only - I have to keep looking for some way to:
use my multiple drives in a raid as a single big and fault tolerant volume
access it from both the host and guest at the same time
find a way to avoid issues caused by network shares by figure out how to mount it in the vm so windows sees it as a local drive
Yes, I’m aware that this is a rather special use case and that finding an answer solving my issues maybe not as easy as to search on google or ask in a forum, but in the past I had several moments where some genius came around the corner about a few weeks after my initial question. So all I can do is to describe my issue as best as possible and answer to your additional questions.
So what about the reverse, separate controller with your zfs pool and use this controller/disk in the qemu machine, run ssh server on the windows qemu machine and from the host connect to the pool if needed with sshfs? When the windows machine is not running unbind (I just script this as part of qemu start up) as use as normal on the host?
As pointed out earlier, have you checked everything will work on the hardware, iommu groups etc?
In order for multiple people to use a single file at the same time there must be a locking schema. This has nothing to do with the FS or network connection it has to do with the software actually accessing the file. I develop DBF style multi-user programs. This requires record level locking to prevent or delay a second user to write at the same time. This is all controlled by the programmer applying and removing locks appropriately. OS’s can provide locking but in general do not do so auto-magically. Things can go sideways if one program or neither program uses locking in a multi-user environment You can open a file in single user mode in which case others can not write to it. But this all depends on the program being used and it’s locking scheme
There is an alternative to locking…
A system of merging rules.
I haven’t seen an authoritative paper on how this works, but I assume that locks can be avoided by integrating changes in an instantaneous and incremental manner so that any change has to include even split second earlier changes.
Example is the Google realtime collaborative tool which allows all collaborators to edit the same document in real time. IIRC it was developed from Piratepad which may not exist anymore (I see a Googe version in my search results, though). There are other apps that do the same thing like Etherpad and something I haven’t used before Riseup Pad.
I’m aghast, but some SysAdmins I know provide DRM keys to multiple hosts in their networks by way of a network share.
Actual implementation will depend on the DRM, but it’s typically all software and does not involve hardware, so you can re-use the keys on any machine.
Unless you absolutely need the same image and state to be available to all machines, I don’t see the need to deploy an entire image or system in common storage. Besides, now many Users are going to access these files at once, not just yourself? Multiple individual Users? That’s different than you as a single User accessing the same data from different machines not simultaneously.
Unless you’re running on multiple physical machines, you do have a single “always on” machine… the HostOS. So, unless you’re deploying across multiple physical machines, deploying your common storage on the HostOS and not in a VM should solve any availability problems.
If you’re chasing performance and want to decrease overhead, I’d recommend iscsi… You can’t do much better than block level storage. But, in a closed network and especially if there were only 2 machines on that network, I wouldn’t expect much network latency. If it really is an issue, I’d explore UDP streaming but without some kind of control protocol and if you decide not to replace with another mechanism to provide that functionality, YMMV. Keep in mind when UDP is used in video streaming, it’s because the decision is that intermittent drops are preferable to ensure smooth, continuous video.
As for your enumerated requirements…
Multiple solutions, take your pick. Some might chase a little performance, but YMMV and if your hardware is capable enough, your choice is insignificant.
Access is never a problem, but writes can be. Multiple possible solutions, as I described if only between the HostOS and a Guest and not on an external netowrk socket, is probably as fast as can be and maybe little difference between different choices.
Windows can map anything as a local drive so isn’t likely an issue at all.
If disk performance is such an issue… I’d recommend two options…
Implement m.2 storage. Combining solid state storage with a PCIe bus instead of the SATA bus, you’ll achieve approx a 7x performance over the 2.5" form factor. The current state of art 4th generation m.2 NVME disk storage is blindingly fast compared to what existed even 8 months ago. There are now also cards that mount multiple m.2 NVME as RAID 0 if you want to try that.
If you have enough RAM, the tried and true solution is to copy your data to a RAM disk.
Covering some points besides what is in my other posts…
ZFS is the preferred, robust technology for deploying large, fault tolerant sotrage pools on RAID, and AFAIK is the only technology supported by RHEL.
BTRFS has that significant parity bit bug they can’t seem to fix, so large array RAID 5 and 6 in particular are discouraged. Instead, BTRFS RAID looks like a good workaround, by implementing mirroring (RAID 1) in a way that supports more than 2 disks, you implement a solution that does not use a parity bit at all. But, if you have no hesitation using ZFS, it should be a good choice.
Can’t comment further on your DRM issues because how the DRM in your specific situation works determines what is possible.
Your requirement for implementing remote storage so it appears as local storage is not unusual and has always been addressed by simply mapping to a drive. I don’t know of a single remote storage accessed as a network share using any network protocol, network location, or whatever that can’t be mapped as a local drive.