| How many files per directory? | |
This simple question came up when I was looking after a neglected server, seeing too many logfiles piling up in it. Such extreme file overflows can go unnoticed until the related application slows down, breaks, the filesystem space or its inodes are exhausted. In the case of log files, the generating application or script is often unaffected and filesystem space is so plenty today that the limits are not reached quickly. The same question about potentially getting large amounts of files comes up when designing an application. The usual standard solution is to put the data into a database, adding complexity from the start. |
Those of you who run a News or BBS system probably know the implications by experience. When I was trying to read up more details about filesystems and their theoretical and practical limits, I found a lot of inconsistent, unclear information on the web. Basically it comes down to this:
I wanted a quick practical test, so I prepared a test script to run on a few exemplary filesystems that were generated with default parameters. The script generates small dummy files in a infinite loop while checking how the system copes with it. Unfortunately, the quick tests didn't include 'df -i' to see the inodes shrinking, it would have been interesting to see that too. |
Quick Results: Even on a untuned standard filesystem, I did not hit any limits but simply run out of time, so I decided to stop file generation at 150.000 files.
The chart above shows: File generation slowed down quickly and noticeably whith increasing file numbers, I think mostly due to writing the ever-increasing directory file back to disk. While the first 10k files had been ceated in 2-3 minutes, it took almost 3 hours to create the last 10k files. Above times will improve on systems having much better I/O, but the basic curve is expected to be the same. Now lets see the size of the directory file:
While ext2 grew the directory file size to about 3 MB, reiserfs still at about 4 MB, NTFS blew the directory file up to a whopping 32 MB. Under ext2 and reiserfs, for normal file operations, there was no perceived impact when working in a directory with 150.000 files while listing, reading, copying or moving files except for shell expansion issues, more about that below. With NTFS even simple operations became unworkable, a simple 'ls -l | more' commands takes forever (33 seconds) to respond with output. From here on, the article is not quite finished yet, still work in progress... Some additional hints to work with a huge number of files: Selecting file groups out of the large pool becomes a problem because UNIX commands expand wildcard parameters using the shell. For example, ls 1* fails because its expanded to ls 1.txt 11.txt, 12.txt ... find /archive -type f -name '*' -print | xargs -l64 -i mv -f {} /oldfiles Without xargs, find has to fork() and exec() an separate /usr/bin/rm process to remove each individual file, which causes significant system overhead. With xargs it only needs to do this once for every X number of files, where X is the number of files it can fit on one command line. Some interesting knowledge extracted from the filesystem documentation: ext2fs theoretical file per directory limit: 1.3 × 1020 files In ext2fs, directories are files that are managed as linked lists of variable length entries. Each entry contains the inode number, the entry length, the file name and its length. So, with getting lots of files per directory, the directory file size itself increases, allocating more data blocks if needed. Further Reading: |
|