Pythian Blog: Technical Track

Orphaned disks in OVM and what to do with them

Some time ago I was doing a maintenance on an OVM and noticed that it had significant number of disks without mapping to any virtual machine (I need to mention that the OVM cluster was a home for more than 400 VMs). Having about 1800 virtual disks it was easy to miss some lost disks without any mapping to VMs. Some of them were created on purpose and were possibly forgotten but the most looked like leftovers from an automatic deployment. I attached several of the disks to a test VM and checked the contents:
[root@vm129-132 ~]# fdisk -l /dev/xvdd
 
 Disk /dev/xvdd: 3117 MB, 3117416448 bytes, 6088704 sectors
 Units = sectors of 1 * 512 = 512 bytes
 Sector size (logical/physical): 512 bytes / 512 bytes
 I/O size (minimum/optimal): 512 bytes / 512 bytes
 
 [root@vm129-132 ~]# dd if=/dev/xvdd bs=512 count=100 | strings 
 100+0 records in
 100+0 records out
 51200 bytes (51 kB) copied, 0.00220462 s, 23.2 MB/s
 [root@vm129-132 ~]#
 
And checked other attributes for the disks from OVM cli:
OVM> show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
 Command: show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
 Status: Success
 Time: 2017-05-19 09:11:26,664 PDT
 Data: 
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Max (GiB) = 2.9
  Used (GiB) = 0.0
  Shareable = Yes
  Repository Id = 0004fb0000030000998d2e73e5ec136a [crepo1]
  Id = 0004fb0000120000d5e0235900f63355.img [6F4dKi9hT0cYW_db_asm_disk_0 (21)]
  Name = 6F4dKi9hT0cYW_db_asm_disk_0 (21)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]
 OVM> 
 
The disk was completely empty and, according to the name and one of the deprecated attributes, it was clear that the disk was a leftover from a deployed assembly. I remembered one issue in the past when shared disks were not deleted if you were using one of assemblies for Oracle RAC deployed and deleted through Oracle Enterprise Manager Self Service Portal (OEM SS). It was noticed on OVM 3.2.x with OEM 12c. In that case, if you had two or more VMs working with the same shared disks those shared disks were not deleted when all VMs and local disks had been destroyed. The issue has been gone for long time but the lost disks were left behind. I created a script to find all the disks without a mapping to any existing VM. The script was written using expect language and ssh cli for OVM. To run the script you need connection to OVM manager using ssh to port 10000 and expect language working on your machine. I used one of the oracle sample scripts to build my own. Here is the script body:
#!/usr/bin/expect
 
 set username [lindex $argv 0];
 set password [lindex $argv 1];
 set prompt "OVM> "
 
 set timeout 3
 log_user 0
 
 spawn ssh -l $username 10.177.0.101 -p 10000
 expect_after eof {exit 0}
 
 ##interact with SSH
 expect "yes/no" {send "yes\r"}
 expect "password:" {send "$password\r"}
 
 #################### Execute Command passed in ##################
 expect "OVM> "
 set timeout 20
 
 match_max 100000
 
 log_user 0
 send "list virtualdisk\r"
 expect "OVM> "
 set resultdata $expect_out(buffer)
 set resultlength [string length $resultdata]
 set idindex 0
 set id ""
 set done 0
 while {$done != 1} {
  set idindex [string first "id:" $resultdata]
  set nameindex [string first "name:" $resultdata]
  if {$idindex != -1 && $nameindex != -1 && $idindex < $nameindex} { set id [string range $resultdata [expr {$idindex+3}] [expr {$nameindex-3}]] send "show VirtualDisk id='$id'\r" expect "OVM> "
  set getVirtualDiskInfo $expect_out(buffer)
  set getVirtualDiskInfoLength [string length $getVirtualDiskInfo]
  set getVirtualDiskInfoIndex 0
  set getVirtualDiskInfoMapping ""
  set doneProcessingVirtualDisk 0
  while {$doneProcessingVirtualDisk != 1} {
  set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
  if {$getVirtualDiskInfoIndex != -1} {
  puts "Disk with mapping: '$id \r"
  set doneProcessingVirtualDisk 1
  } else {
  puts "Disk without mapping:'$id \r"
  set doneProcessingVirtualDisk 1
  }
  }
  set resultdata [string range $resultdata [expr {$nameindex+1}] $resultlength]
  set resultlength [string length $resultdata]
  } else {
  set done 1
  }
 }
 
 log_user 1
 
 expect "OVM> "
 send "exit\r"
 
You can see the script is simple enough and doesn't require a lot of time to write. I redirected output of the script to a file in order to analyze the output.
[oracle@vm129-132 ~]$ ./dsk_inventory admin password >dsk_iventory.out
 [oracle@vm129-132 ~]$ wc -l dsk_inventory.out 
 1836 dsk_inventory.out
 [oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory.out | wc -l
 482
 [oracle@vm129-132 ~]$ 
 
As you could see, I had 482 orphaned disks out of 1836. It was more than 25% of all disks and it was not only wasting space but it also had a significant impact to interface performance. Every time when you tried to add, modify or delete a disk through OEM SS it took a long pause to retrieve information about the disks. I decided to delete all those disks using the same script but just added a couple of lines to delete the disk if it doesn't have a mapping. Here is modified section of the script:
  while {$doneProcessingVirtualDisk != 1} {
  set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
  if {$getVirtualDiskInfoIndex != -1} {
  puts "Disk with mapping:'$id'\r"
  set doneProcessingVirtualDisk 1
  } else {
  puts "Disk without mapping:'$id'\r"
  send "delete VirtualDisk id='$id'\r"
  expect "OVM> "
  set doneProcessingVirtualDisk 1
  }
  }
 
The changes were minimal and send "delete" command to OVM if a disk doesn't have any mapping. Of course if you want to exclude certain disks you should add more conditions with "if" using disks ids to prevent them from being deleted. And it is safe since you are using an approved standard interface and it will not allow you to delete a disk if it has an active mapping to any VM. If you try to delete a disk with an active mapping you are going to get an error:
OVM> delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
 Command: delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
 Status: Failure
 Time: 2017-05-19 09:28:13,046 PDT
 JobId: 1495211292856
 Error Msg: Job failed on Core: OVMRU_002018E crepo1 - Cannot delete virtual device 6F4dKi9hT0cYW_crs_asm_disk_1 (23), it is still in use by [DLTEST0:vm129-132 ]. [Fri May 19 09:28:12 PDT 2017]
 OVM> 
 
I ran my script, deleted all the non-mapped disks and repeated the inventory script to verify results. I found a couple of disks which were not deleted.
[oracle@vm129-132 ~]$ ./del_orph_dsk admin Y0u3uck2 > del_dsk_log.out
 [oracle@vm129-132 ~]$ ./dsk_inventory admin Y0u3uck2 >dsk_inventory_after.out
 [oracle@vm129-132 ~]$ wc -l dsk_inventory_after.out
 1356 dsk_inventory_after.out
 [oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory_after.out | wc -l
 2
 [oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory_after.out 
 Disk without mapping:0004fb0000120000a2d31cc7ef0c2d86.img 
 Disk without mapping:0004fb0000120000da746f417f5a0481.img 
 [oracle@vm129-132 ~]$ 
 
It appeared that the disks didn't have any existing files on the repository filesystem. It looked like the files were lost some time ago due to a bug or maybe some past issues on the file system.
OVM> show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
 Command: show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
 Status: Success
 Time: 2017-05-19 12:35:13,383 PDT
 Data: 
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Max (GiB) = 40.0
  Used (GiB) = 22.19
  Shareable = No
  Repository Id = 0004fb0000030000998d2e73e5ec136a [crepo1]
  Id = 0004fb0000120000a2d31cc7ef0c2d86.img [ovmcloudomsoh (3)]
  Name = ovmcloudomsoh (3)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]
 OVM> delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
 Command: delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
 Status: Failure
 Time: 2017-05-19 12:36:39,479 PDT
 JobId: 1495222598733
 Error Msg: Job failed on Core: OVMAPI_6000E Internal Error: OVMAPI_5001E Job: 1495222598733/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1, failed. Job Failure Event: 1495222599299/Server Async Command Failed/OVMEVT_00C014D_001 Async command failed on server: vms01.dlab.pythian.com. Object: ovmcloudomsoh (3), PID: 27092, 
 
  Server error: [Errno 2] No such file or directory: '/OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img'
 
  , on server: vms01.dlab.pythian.com, associated with object: 0004fb0000120000a2d31cc7ef0c2d86.img [Fri May 19 12:36:39 PDT 2017] 
 OVM> 
 
So, we had information about disks in the repository database but didn't have the disks themselves. To make the repository consistent, I created empty files with the same names as the nonexistent virtual disks and deleted them using OVM CLI interface.
root@nfsserv:~# ll /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img
 /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img: No such file or directory
 root@nfsserv:~# touch /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img
 root@nfsserv:~# 
 
 
 OVM> delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
 Command: delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
 Status: Success
 Time: 2017-05-23 07:41:43,195 PDT
 JobId: 1495550499971
 OVM> 
 
I think it can be worth to check from time to time whether you have any disks without mapping to any VM, especially if your environment has a considerable number of disks and has long story of upgrades, updates and high users activity. And now a couple of words about OVM CLI and using "expect" language for scripting... As you can see, the combination provides good options to program your daily routine maintenance on OVM. It would take ages to find and clear all those disks manually using GUI.

No Comments Yet

Let us know what you think

Subscribe by email