Pete's Thoughts on Cloud Computing: A History of Improving File System Performance

A Fast File System for UNIX
This paper begins by mentioning the types of applications, VLSI design and image processing, that do a small amount of processing on a large quantity of data and need to have a high throughput from the system. In the traditional UNIX file system, the inodes are not consecutively ordered which results in many disk accesses. To improve the file system throughput, a larger block size was used. But a larger block size results in a larger percentage of wasted space. The design team tried to counter this waste by putting small files in fragments within blocks. When this new file system was tested, there were order of magnitude increases in read and write bandwidth. The price that is paid is wasted disk space and increased CPU utilization. With CPU speed and disk size increasing exponentially and disk bandwidth lagging far behind, this seems like a desirable tradeoff.

The Design and Implementation of a Log-Structured File System
This paper reiterates the fact that processor speed is increasing exponentially, while memory and disk speeds are being left behind. While there are ways to increase disk throughput, there are no foreseeable improvements for access time. Information is spread around the disk, which causes too many disk accesses. So it seems that we should attempt to minimize disk accesses. A log-structured file system (LFS), which writes all modifications to disk sequentially, attempts to accomplish this goal.
A traditional ﬁle system achieves logical locality by assuming certain access patterns and pays extra on writes, to organize information optimally on disk for the assumed read patterns. In contrast, LFS achieves temporal locality: information that is created or modified at the same time will be grouped closely on disk. If temporal locality matches logical locality, as it does for a ﬁle that is written sequentially and then read sequentially, then a LFS should have about the same performance on large ﬁles as a traditional ﬁle system. If temporal locality differs from logical locality then the systems will perform differently: LFS will perform poorly on reads. This speeds up crash recovery because the system only needs to scan the most recent portion of the log, instead of the entire disk.
The main problem with LFS is degraded performance for logical reads, which is what journaling file systems try to address.

Analysis and Evolution of Journaling File Systems
This paper gives an overview of various journaling file systems in use today. Journaling tries to provide the best o both worlds: data is written to the log sequentially, then moved to is logical location. This provides fast crash recovery, while all normal reads occur from the standard blocks. Journaling is how modern commercial file systems work, such as Unix ext3 and Windows NTFS.

Pete's Thoughts on Cloud Computing

Tuesday, September 8, 2009

A History of Improving File System Performance

No comments:

Post a Comment

Followers

Blog Archive

About Me