Update about Recovery

As it's a few days ago, since I wrote the last post, I would like to give a small insight of what exactly is happening behind the scenes and to give an update on the progress. I'm very glad for the support I received so far. Also, feel free to comment or ask in the Discord, if you are interested in some recovery step details or if you have any other questions.

Since I brought back up the temporary page, I was working on recovering the Cassandra database where all the found hashes are stored. The first steps were fairly easy, most of the sstable files were possible to be loaded.

What I ended up with now, is one remaining sstable. The issue is, that it's the largest of them (300GB data file) which holds approximately 90% of all the found data (the other sstables where only a few GB in size each). And this large sstable is corrupted (probably some bytes wrong at a few places).

The Cassandra online scrubbing process refuses to handle the file. After exploring some other ways to load the data somewhere (e.g. sstableloader), the only solution left would be using the offline scrubbing tool (sstablescrub). With this tool, it would be possible to process the file and repair it (with only loosing the effective corrupted entries). The downside is that sstablescrub is VERY slow. Based on the speed of one of the smaller files I had to repair with it, processing the 300GB data file would take close to one year! 

Therefore, the scrubbing option was definitely out, as such a long down time is not feasible.  So, the current idea is to write an own tool (where I get some help) to extract data out of the sstable in a faster way (as it does not need to be that sophisticated as sstablescrub due to a fairly simple table structure and not very much heterogeneous data) and then just re-import this extracted data into thew new database.

The process of re-importing such extracted data (which will be in CSV format), will be a thing of a few days up to maybe a week. So, if this own tool is able to extract the data which we are missing, we are well on track to have hashes.org fully restored in the next weeks.

