Accidentally quadratic: rm -rf??

This is a REALLY bad time. Please please PLEASE tell me I'm not fucked…

I've accidentally created about a million files in a directory. Now ls takes forever, Python's os.listdir is faster but still dog-slow – but! but! there's hope – a C loop program using opendir/readdir is reasonably fast.

Now I want to modify said program so that it removes those files which have a certain substring in their name. (I want the files in that directory and its subdirectories, just not the junk I created due to a typo in my code.)

HOW THE FUCK DO I DO THAT IN O(N)??

O(N^2) is easy enough. The unlink function takes a filename, which means that under the hood it reads ALL THE BLOODY FILE NAMES IN THAT DIRECTORY until it finds the right one and THEN it deletes that file. Repeated a million times, that's a trillion operations – a bit less because shit gets divided by 2 in there, but you get the idea.

Now, readdir gives me the fucking inode number. How the FUCK do I pass it back to this piece of shit operating system from hell, WITHOUT having it search through the whole damned directory AGAIN to find  what it just gave me?

I would have thought that rm -rf for instance would be able to deal with this kind of job efficiently. I'm not sure it can. The excise function in the guts of coreutils for instance seems to be using unlinkat which gets a pathname. All attempts to google for this shit came up with advice to use find -inode -exec rm or some shit like that, which means find converts inode to name, rm gets the name, Unix converts the name back to inode…

So am I correct in that:

  • Neither Unix nor the commercial network filers nor nobody BOTHERS to use a hash table somewhere in their guts to get the inode in O(1) from the name (NOT A VERY HARD TASK DAMMIT), and
  • Unix does not provide a way to remove files given inode numbers, but
  • Unix does unfortunately makes it easy enough (O(1)) to CREATE a new file in a directory which is already full of files, so that a loop creating those bloody files in the first place is NOT quadratic?? So that you can REALLY shoot yourself in the foot, big time?

Please tell me I'm wrong about the first 2 points, especially the second… Please please please… I kinda need those files in there…

(And I mean NOTHING, nothing at all works in such a directory at a reasonable speed because every time you need to touch a file the entire directory is traversed underneath… FUUUUUCK… I guess I could traverse it in linear time and copy aside somehow though… maybe that's what I'd do…)

Anyway, a great entry for the Accidentally Quadratic blog I guess…

Update: gitk fires up really damn quickly in that repository, showing all the changes. Hooray! Not the new files though. git citool is kinda… sluggish. I hope there were no new files there…

Updates:

  • `find fucked-dir -maxdepth 1 -name "fuckers-name*" -delete` nuked the bastards; I didn't measure the time it took, but I ran it in the evening, didn't see it finish in the half an hour I had to watch it, and then the files were gone in the morning. Better than I feared it would be.
  • As several commenters pointed out, many modern filesystems do provide O(1) or O(log(N)) access to files given a name, so they asked what my file system was. Answer: fucked if I know, it's an NFS server by a big-name vendor.
  • Commenters also pointed out how deleting a file given an inode is hard because you'd need a back-link to all the directories with a hard link to the file. I guess it makes sense in some sort of a warped way. (What happens to a file when you delete it by name and there are multiple hard links to it from other directories?.. I never bothered to learn the semantics under the assumption that hard links are, um, not a very sensible feature.)

66 comments ↓

#1 Les on 06.24.15 at 6:28 pm

What does `find -delete` do in this case?

#2 Viktor on 06.24.15 at 6:28 pm

I think that if you divide the files in 10 folders then you will have 1/10-th of the original complexity

#3 Yossi Kreinin on 06.24.15 at 6:35 pm

@Les: I don't know. Perhaps I should find out.

@Viktor: yeah, that's what everybody does when they do it ON PURPOSE in produciton, but this here was a fucking one-time ACCIDENT. Now how do I clean it up??

#4 Anonymous on 06.24.15 at 6:45 pm

You didn't say what filesystem this is. Ext4 (and I think even ext3 but not by default) have dir_index support (tune2fs -l /dev/sdaX |grep dir_index), and XFS can similarly deal with large number of files.

Perhaps this will help:
https://github.com/danfruehauf/Scripts/blob/master/spread_files/spread_files.sh

#5 Bub on 06.24.15 at 6:47 pm

Just use rsync to delete all files that match a substring. Rsync caches agressively, so it's much faster than doing find . | grep $keyword | xargs rm or something similar.

#6 Jonathan on 06.24.15 at 6:52 pm

Modern Linuxes do actually use a b-tree for directories, so it's only O(n log n). But still, the whole directory / inode dichotomy in Unix is a mistake. (And no, you can't delete a file by inode number, because a file can have multiple links and the OS would have to keep a list of *all* of them to do the reverse mapping).

#7 Sergey on 06.24.15 at 7:08 pm

Maybe copying the good files to a new directory will be faster.

#8 Max Lybbert on 06.24.15 at 7:18 pm

Is it possible to generate all possible filenames, and call unlink on them? That is, call unlink without first knowing whether the file exists.

I guess it's a question of whether the filenames are predictable beyond "has this substring in its name" and what fraction of possible filenames you actually used (e.g., if you created 900,000 files out of a possible 1,000,000, then that's one thing; if you created 100,000 out of a possible 1,000,000, that's something else).

#9 VLM on 06.24.15 at 7:18 pm

xargs?

ls | grep delete_me | xargs rm

Is xargs with rm quadratic?

Might be interesting/safer to use interim data files in /tmp instead of pipes.

#10 Assaf on 06.24.15 at 9:52 pm

I would be surprised if you could delete by inode alone, since that would leave around a bunch of file names potentially pointing to those inodes. Deleting a file is more than deleting its inode, you have to delete its entry(ies) in the directory, too… no?

#11 ai on 06.24.15 at 10:55 pm

Yossi, assuming you're using a filesystem in the family of ext*fs, have a look at the e2fsprogs source code, and more specifically debugfs/kill_file_by_inode(). Still, be careful not to corrupt your fs!

#12 Karellen on 06.25.15 at 1:13 am

'ls' is dog slow because it reads *all* the directory entries into memory, and then sorts them, before output. Use 'ls -U' to output in directory order as it does the readdir.

#13 Phil Miller on 06.25.15 at 1:21 am

As noted above, deleting by inode would be dangerous to the filesystem's integrity, since there could be other links to that inode. Of course, the kernel could enforce a condition of inode->link_count == 1 to allow it.

More sensible for this case, I think, would be deletion by dirent, which gives some cookie within the directory (d_off), and the name and inode number to either confirm that the right thing is to be deleted or to look up the right thing if the directory's contents changed.

#14 desr on 06.25.15 at 1:34 am

tune2fs -O dir_index, if it's an old ass ext3
newer filesystems are not that stupid

#15 Matt Ahrens on 06.25.15 at 3:22 am

Filename lookup depends on what filesystem you are using. ZFS uses an on-disk hashtable (extendible hash to be specific), so filename lookup is O(1) and "rm -rf" can be O(number files deleted).

Delete by inode number would be much slower, because the filesystem would need to find the directory entry to remove, which AFAIK is at best O(number files in parent directory).

#16 Norman Yarvin on 06.25.15 at 6:44 am

Copy out the files you want to a new filesystem, and nuke the old one from orbit.

That is, assuming you're right about rm being O(N); on Linux all the major filesystems (ext4,xfs,btrfs) use hash tables or trees and thus should be faster than that. Or is this a Mac or some BSD variant?

#17 Arseny Kapoulkine on 06.25.15 at 7:32 am

It does not make too much sense for file creation to be O(1) and file removal to be O(N) – before you create the file you have to check if the file with the same name is already there. Of course it's possible to implement file removal to be O(N) *after* a sub-linear name lookup but it feels unlikely that file systems would do this.

#18 Michael Neumann on 06.25.15 at 10:16 am

Might be much faster, depending on your filesystem, to just remove the whole directory, if that's an option! For instance in case of HAMMER2 filesystem, directory removal is a constant operation.

#19 Darren on 06.25.15 at 6:59 pm

NTFS actually keeps the list of directories a file is in, as well as the names the file has in those directories, in the NTFS-equivalent of the i-node. So deleting by i-node would be entirely doable if UNIX ever had historically tried to support it.

#20 Grogard on 06.26.15 at 4:52 am

Move out the files you want and kill the rest. Alternatively look at the inodes and see the first bad file and directly remove from there.

#21 Alex on 06.26.15 at 5:32 pm

The `dirent` struct contains a field called `d_filename`; see `man 2 readdir`. Also `man 2 unlinkat`.

Re "What happens to a file when you delete it by name and there are multiple hard links to it from other directories?": Files are reference counted. Unlinking simply deletes a reference. If all the references are gone, the file is removed.

#22 Lev Serebryakov on 06.26.15 at 8:03 pm

FreeBSD supports hashed directories on UFS as optimization (old format is still always present, for binary compatibility).

XFS uses btrees. ZFS, I'm pretty sure, too (but need to check).

NFS is very slow anyway, it is almost irrelevant which underlying FS is used by server.

#23 gus3 on 06.29.15 at 1:03 am

If the NFS server's export is on a journaled filesystem, the journal semantics can have a huge impact on FS operations. Think ext3's "data={journal|ordered|writeback}" variations.

In addition, unlinking a file on a journaled FS is one of the most complicated operations there is, managing reference counts, likely (but not guaranteed) block/extent releases, and directory inode modifications. Given that all the unlinks are in the same directory, that's a lot of serialization happening in the journal entries when the journal fills up.

#24 anon on 07.09.15 at 12:41 am

Hahahaa. I used to have this tarball that had special names that collided on the kernel filesystem hash table -> O(n^2). So if you did "rm -rf *" it was meant to take a century, I think. You literally had to re-compile your kernel with a different hash to do kill the files.

I think it's been fixed, but posting as anon just to be sure not to step on too many toes :) The fix was meant to be randomization of a seed in the hash func at kernel startup. The bug was filed but dunno if it got followed up on. Anyway, accidental O(n^2) is _super_ frustrating and the filesystem has weird quirks.

#25 Alexander Riccio on 07.17.15 at 10:11 am

Also relevant: "You can list a directory containing 8 million files! But not with ls.." (intentionally not linked)

#26 Andrew on 01.01.17 at 4:20 am

"What happens to a file when you delete it by name and there are multiple hard links to it from other directories?"

That's the easy one! You don't delete it (by name or otherwise), you unlink it (removing the link with the given name). The file only actually goes away when there are no more links to it and no more file descriptors open to it, as a sort of autonomic process. The system doesn't give you the ability to reach in and delete the "actual file" out from under things that hold references to it.

#27 bisnis online tanpa modal on 04.30.19 at 7:42 am

I was reading through some of your articles on this internet site and I think this website is really informative! Continue posting.

#28 bisnis anak muda yang menjanjikan on 04.30.19 at 10:51 am

I consider something really special in this site.

#29 revolting on 05.11.19 at 2:37 am

Do you have а spam problem on this weƄsite; I also am ɑ blogger,
and I was wondering your situation; many of us have creatеd some nice methods ɑnd we are looking to exϲhangе techniques with other folks, be sure to shoot me
an e-mail if inteгeѕted.

#30 Ethan Bednarowicz on 05.14.19 at 9:44 pm

5/14/2019 yosefk.com does it again! Quite a thoughtful site and a good post. Nice work!

#31 Ty Heiple on 05.15.19 at 5:53 am

5/14/2019 yosefk.com does it again! Very interesting site and a good post. Thanks!

#32 pornhd on 05.15.19 at 8:52 am

For newest information you have to pay a ѵisit world-wide-web and օn world-widе-web I found this web site as a most excellent website
for latest updates.

#33 หนังโป๊ไทย on 05.15.19 at 9:16 am

Youг stуle is very unique in сomparison to other people I have read stuff from.
I aрpreciate you for posting when you have
the opportunity, Guess I'll just booқmark this page.

#34 เย็ดหี on 05.15.19 at 12:24 pm

Hello there! I know tһis is somewhat off topіc but I was wondеring if yoᥙ knew where I could locate a captcha plugin for my cօmment form?

I'm using tһe same blog ρlatform as yours and I'm having trouble finding one?
Thanks a ⅼot!

#35 เย็ดหี on 05.15.19 at 12:56 pm

It'ѕ an amazing post in favor of all the web visitoгs; they will get benefit from it I am
sure.

#36 คลิป on 05.15.19 at 1:36 pm

Do you mind іf I quote a few of your articles as long aѕ I provide credit
and sources back to your website? My blog is in the very same niche as yours ɑnd my visitoгs would genuinely benefit
from a lot of the іnformatiօn you provide here.
Please let me know if this alright with you.
Thanks a lot!

#37 paladins aimbot on 05.15.19 at 2:32 pm

I dugg some of you post as I thought they were very beneficial invaluable

#38 เอากัน on 05.15.19 at 4:14 pm

I am rеally loving thе theme/deѕign of your weblog.

Do you evеr run into any internet browser compatibility
problеms? A couple ᧐f my ƅlog readers have complained about mʏ blog not working corгectly іn Еxplorer Ƅut looks
great in Firefox. Do you have any tips to help fix this problem?

#39 เงี่ยน on 05.15.19 at 6:17 pm

I believe thіs is among the most significant information for me.
And i'm happy studyіng your article. However want to observation on few basic issues, The weƅ site taste is perfect, the articles is in poіnt of fact nicе
: D. Just right joЬ, cheers

#40 krunker hacks on 05.16.19 at 12:24 pm

Hey, google lead me here, keep up nice work.

#41 หนังxxxฟรี on 05.16.19 at 4:00 pm

If some οne neеds expeгt viеw concerning blogging and sitе-building afterwaгd i suggest hіm/her to
viѕit this website, Keep up the good work.

#42 คลิปxxxx on 05.16.19 at 5:37 pm

Rіght here is the right webpage for anybody who really wants to
undeгstand this topic. You understand a wholе lot its almost hard to argue witһ you (not that I
personally woulɗ want to…HaHa). Yⲟu definitely
put a fresh spin on a topic that has been written about for ages.
Exϲellent stuff, just excellent!

#43 99bb on 05.16.19 at 6:31 pm

I'm ցone to convey my little brother, that һe should also
ɡo to see this web site оn regular basis
to take updated from mοst up-to-date information.

#44 nonsense diamond key on 05.17.19 at 6:31 am

Enjoyed reading through this, very good stuff, thankyou .

#45 fallout 76 cheats on 05.17.19 at 9:59 am

I am glad to be one of the visitors on this great website (:, appreciate it for posting .

#46 เจ็บจัง on 05.17.19 at 10:36 am

Tоuche. Sound arguments. Keep up the good spirit.

#47 คลิป on 05.17.19 at 12:46 pm

It'ѕ remarkable to visіt this web sіte and reading tһe views
of all colleagues on the topic ߋf this post,
while I am also eager of getting experience.

#48 เอากัน on 05.17.19 at 5:27 pm

Thanks fοr any other fantastic article. The place elѕe couⅼd anyone get that type of informatiⲟn in such a perfect means οf writіng?
I've a presentation next week, and I'm аt the look for sսcһ information.

#49 คลิปหลุด on 05.17.19 at 5:34 pm

great issues altogetһer, you just received ɑ emblem neᴡ reader.
What might you suggest in regarԁs to your ρublish that you
made some days ago? Ꭺny sure?

#50 chaturbate hack cheat engine 2018 on 05.18.19 at 7:38 am

Good, this is what I was looking for in bing

#51 javhd on 05.18.19 at 6:49 pm

Whу users still use to read news papers ᴡhen іn this technological wօrld the whoⅼe thіng is existing on web?

#52 xhamster on 05.18.19 at 7:17 pm

I’m not tһat much of a internet reader to be honest but your blogs гeally nice, keep it up!

I'll go ahead and boօkmark yоur site to come back down thе road.
Cheers

#53 mining simulator codes 2019 on 05.19.19 at 6:30 am

This is amazing!

#54 หนังโป๊ญี่ปุ่น on 05.19.19 at 7:31 am

I'm now not certaіn where you are getting your information, however great topic.
I needs to spend a while findіng out more or figuring out
more. Tһank you for fantastic info I wаs in search of this info
fоr my mіssion.

#55 xxxx on 05.19.19 at 12:49 pm

Ꮤhat's up to every body, it's my first рay a quick viѕit of this weƅsite; thіs webⅼog consists of remarkable and in fact good material in favor of readers.

#56 ดูหนังxxx on 05.19.19 at 4:02 pm

Thɑnk you for some otһer magnificent article. The place else may just anybody get that kind of information in such a perfect method of writing?
I've a presentation next week, and I am on the search for such info.

#57 คลิป on 05.20.19 at 2:46 pm

Hey! This is my 1st comment һere ѕo I just wɑnted to give a
quick shout out and say I really enjoy reading your articles.
Can you suggeѕt any other blogs/websites/forums that cover the same topics?

Thanks a lot!

#58 คลิปนักศึกษา on 05.20.19 at 3:35 pm

bookmarkеd!!, I love your website!

#59 xvdo on 05.20.19 at 5:51 pm

I trսly love your website.. Pleasant coloгs & theme.
Did you create this amazing site yourself?
Pleasе reply back as I'm looking to create my
very own blog and ѡant to find out where you got
thiѕ from or just what the theme is calleⅾ. Many thanks!

#60 xvideo on 05.20.19 at 6:17 pm

That iѕ a great tip еspecially to those neԝ to
the ƅlogosⲣhere. Simple but very accurate
information… Thanks fοr sharіng this one. A must read article!

#61 tube8 on 05.21.19 at 11:27 am

Ԝhat's up, its nice post cοncerning media print, we all be aware
of media is a enormߋus source of fɑcts.

#62 xhamster on 05.21.19 at 11:47 am

I like the vaⅼuable information you suρply on your articles.

I will booҝmark your blоg and take a look at again rigһt here frequently.

Ι'm rеlatively certain I will bе inf᧐rmed many new stuff proper right
here! Gooⅾ luck for thе next!

#63 สวิงกิ้ง on 05.21.19 at 12:31 pm

Actᥙally when someone doesn't understand after that its
up to other people that tһey will help, so here it occurs.

#64 beeg on 05.21.19 at 1:50 pm

Yⲟu reallу maҝe it seem so eɑsy ᴡith your presentation however I find
this topic to be really one thing that I think I might never underѕtand.
It kind of feels too cοmplex ɑnd extremely vast for mе. I'm taking a look
ahead on your subsequent put up, I'ⅼl try to get the hang of it!

#65 free fire hack version unlimited diamond on 05.21.19 at 3:50 pm

I like this article, useful stuff on here : D.

#66 nonsense diamond on 05.22.19 at 5:41 pm

I like this website its a master peace ! Glad I found this on google .

Leave a Comment