Sunday, September 24, 2023

FUCKING SPECTACLE!!!

I was using the tool Spectacle on KDE to record my screen and found that memory usage grew by approximately 500MB/s. It took less than a minute to use up 20GB of memory and would have continued to eat up more if I didn't kill it in time. 

Twice now my PC has crashed while using this tool and now I know the reason! Fucking Spectacle eats up all my RAM!!

Fucking KDE!!!

How to deal with I/O error cannot copy file

When you can't copy a file due to bad disk sectors (it says I/O error) using cp, use dd instead.

See https://serverfault.com/questions/489696/recovering-a-file-with-bad-blocks-in-the-middle

Fucking SSD is failing

$ dd if=./compressed.rar of=~/Desktop/recovered.rar bs=4k conv=noerror,sync
dd: error reading './compressed.rar': Input/output error
498281+0 records in
498281+0 records out
2040958976 bytes (2.0 GB, 1.9 GiB) copied, 7.10046 s, 287 MB/s
dd: error reading './compressed.rar': Input/output error
498296+1 records in
498297+0 records out
2041024512 bytes (2.0 GB, 1.9 GiB) copied, 7.36989 s, 277 MB/s
dd: error reading './compressed.rar': Input/output error
498296+2 records in
498298+0 records out
2041028608 bytes (2.0 GB, 1.9 GiB) copied, 7.55401 s, 270 MB/s
dd: error reading './compressed.rar': Input/output error
498310+3 records in
498313+0 records out
2041090048 bytes (2.0 GB, 1.9 GiB) copied, 7.74294 s, 264 MB/s
dd: error reading './compressed.rar': Input/output error
498330+4 records in
498334+0 records out
2041176064 bytes (2.0 GB, 1.9 GiB) copied, 7.94098 s, 257 MB/s
dd: error reading './compressed.rar': Input/output error
498660+5 records in
498665+0 records out
2042531840 bytes (2.0 GB, 1.9 GiB) copied, 8.58642 s, 238 MB/s
dd: error reading './compressed.rar': Input/output error
498675+6 records in
498681+0 records out
2042597376 bytes (2.0 GB, 1.9 GiB) copied, 9.14104 s, 223 MB/s
dd: error reading './compressed.rar': Input/output error
498690+7 records in
498697+0 records out
2042662912 bytes (2.0 GB, 1.9 GiB) copied, 9.61404 s, 212 MB/s
dd: error reading './compressed.rar': Input/output error
498705+8 records in
498713+0 records out
2042728448 bytes (2.0 GB, 1.9 GiB) copied, 9.98402 s, 205 MB/s
dd: error reading './compressed.rar': Input/output error
499040+9 records in
499049+0 records out
2044104704 bytes (2.0 GB, 1.9 GiB) copied, 10.5384 s, 194 MB/s
dd: error reading './compressed.rar': Input/output error
499055+10 records in
499065+0 records out
2044170240 bytes (2.0 GB, 1.9 GiB) copied, 11.0082 s, 186 MB/s
dd: error reading './compressed.rar': Input/output error
499086+11 records in
499097+0 records out
2044301312 bytes (2.0 GB, 1.9 GiB) copied, 11.3772 s, 180 MB/s
dd: error reading './compressed.rar': Input/output error
499826+12 records in
499838+0 records out
2047336448 bytes (2.0 GB, 1.9 GiB) copied, 11.7863 s, 174 MB/s
567918+14 records in
567932+0 records out
2326249472 bytes (2.3 GB, 2.2 GiB) copied, 15.741 s, 148 MB/s

Got 13 I/O errors reading a single 2.2GB file, wtaf. 

Interestingly it seems all of the errors were concentrated in a single region from 2040958976 to 2044301312? 

If so then an entire 3MB block was corrupted.

This is an interesting data point. I guess I should plan for such things for ECCTool.

EDIT: It seems like the data is fine, only the recovery record was corrupted.

Tuesday, September 19, 2023

Managed to recover corrupted Virtualbox VM

Today I was using my VirtualBox VM to download some files and that used up all the disk space on my actual hard drive where the VM was stored. This caused the VMs to halt with an error, basically they just got stuck and can't do anything.

"Ok, no problem" I thought, I'll just copy over the VM to my bigger hard drive and resume it. That was a dumb decision in hindsight.

The VirtualBox VM just got stuck doing something, not sure exactly what it was doing but I couldn't shut it down. In the end I used kill -9 to kill it, thinking "I've copied over all the files so it should be fine". But, that was incredibly dumb to do in hindsight.

The vbox files (XML) were somehow truncated. The .vbox-prev files were also truncated in exactly the same way. Oh shit. 

Anyway, I did something very stupid, which was to try to start the VM from the previous vbox files. One of them did work, but ended up I believe overwriting the "latest" state (contained in the "latest" snapshot, which was a VDI file). And I ended up losing multiple months worth of data.

I also tried to create a new VM using the existing VDI disk image, but that disk image turns out to be super super old, like more than a year old, so that didn't work.

I think the problem is that I used the snapshot feature, which makes restoring VMs a huge pain in the ass. I made a snapshot when I was upgrading the OS to the next version, which I think was sensible at the time but, I didn't know that snapshots are really fragile and prone to causing data loss.

Anyway, so it turns out that all of the new data is stored in snapshots. There was a 62GB snapshot which I had overwritten because I started the VM from an old vbox file. But, luckily, there was another snapshot VDI file which was from the time the VM got stuck, so if I could recover from that then I could basically recover all of the data.

Anyway, recovering from snapshots was not easy.

Actually, first I tried to fix my vbox file by editing the XML and closing the tags and so on. That didn't work, although it gave me some idea about how the information about the disk images and snapshots were organized, where they were etc.

Next, I googled online and read the VirtualBox forums - it turns out someone has had encountered the truncated vbox file problem before, though the answers were not encouraging at all.

But anyway, I somehow came across a comment from someone who mentioned the vboxmanage clonehd command, which he says FLATTENS snapshots into a single VDI file! This was exactly what I needed! I just need to run clonehd on the snapshot that I want, and it will (I presume) automatically resolve all the snapshot's parent snapshots to produce a single VDI image of the entire state at that snapshot.

Anyway I tried it on the latest snapshot and as I expected, because I had over-written the "latest" state by starting from an old vbox file, the resulting VDI image had stale data that was around 18 days ago, so I lost almost 3 weeks worth of data.

More importantly, that was the time I upgraded the OS version!! I remember upgrading the OS from Ubuntu 21.10 to 22.04 was a huge pain in the ass because all the sources were expired and I had to edit a whole bunch of config files, install packages, modify sources.list etc, can't remember exactly but it was a huge pain I had to google a lot of stuff, and now I had to go through all that trouble again, plus I lost 18 days worth of browsing history which was very painful for me, so I was determined to do everything within my power to get that data back.

Anyway, so I saw that snapshot file that was last modified at around the time the VM got stuck rather than later (which was when I stupidly overwrite the "latest" state by opening an old vbox file), and I decided to try my luck "flattening" the snapshot into a vdi disk file that I could import in a new VM.

However, when I tried to do that, it gave me an error:

VBoxManage: error: Parent medium with UUID {} of the medium is not found in the media r
egistry (
.config/VirtualBox/VirtualBox.xml)

I tried to get around this error by copying and pasting from the truncated vbox file into the media registry file but it didn't work. I think you're not supposed to directly edit the VirtualBox.xml file and if you do it won't work...

Well, how did I get into this mess? Oh right, by opening the old vbox file instead of the latest. No, that wasn't the actual error. 

My real error was not making an additional copy of the VM directory before accessing the files.

The VirtualBox program, when it opens the vbox file, of course it modifies the existing vdi files - the snapshots, the hard disk image etc. 

I should have made an additional copy of all the files before doing anything with them.

I was too careless - I got complacent even though trying to move VirtualBox files had burned me really badly before.

ALWAYS ALWAYS make an additional copy of the copied-over VM files before you try to open them.

Anyway that was the hard lesson learned.

But let's go back to my very lucky, miraculous recovery of the VM.

So how did I manage to resolve the error in the end? I'm not 100% sure. I tried to create a new VM using the parent snapshot of the snapshot that I was trying to clone, but nothing happened basically. But I also used the VboxManage tool to create an image from that parent snapshot, and after doing that, I was able to do vboxmanage cloneimage the snapshot that I couldn't clone before. I still don't know exactly how I fixed it, but now it worked, and the cloned (flattened) VDI image had the latest data in it.

So, that was lucky. Although it took me a few hours, I managed to recover all of the data in the end.

Lesson learned: NEVER EVER COPY A VM WHILE IT IS STILL RUNNING. IT WILL 100% BE CORRUPTED.

AND ALWAYS ALWAYS make an additional copy of the copied-over VM files before you try to open them.

Saturday, September 2, 2023

Wow, changing user agent made it slow

So I'm scraping this website, if I use headless mode it takes like 1.3 seconds but it says "Please wait..." and the HTML looks like a security check. 

If I add the user agent then it takes like 4.8 seconds and now it says "Welcome to...". 

So probably it is noticing the headless user agent and redirecting me to the security page instead of rendering the actual page.

Funny thing is that the non-headless mode takes the exact same amount of time (4.8 seconds) to load the page. 

So headless mode isn't actually any faster than non-headless chrome.

Headless mode does use significantly less RAM though:

Non-headless:

779MB - 490MB = 289MB

Headless:

764MB - 612MB = 152MB

Almost half the RAM usage!

There is some variance though.

Selenium Firefox uses so much RAM wtf

Each selenium Firefox window uses like 400MB of RAM. 

Headless mode reduced that to 250-300MB but it's still a lot.

Let's try using tabs instead of windows, even though windows are more convenient...

UPDATE: I tried the chrome driver instead of firefox and the chrome version used barely any RAM, like 10MB...wtf...

The Chrome version also ran in ~3.6s instead of ~5.7 seconds for the Firefox version...wtf