Wednesday 24 March 2010

MAPPING files into memory vs READING them

Brian d Foy 'efective perler' blog talks about mapping files into memory to avoid IO and memory footprint,

Memory-map files instead of slurping them

It uses the module File::Map


use File::Map qw(map_file);

{
my $start = time;
map_file my $map, '/Volumes/Hercules/Red/revealing_it_all_big.mov';
my $loadtime = time - $start;
print "Loaded file in $loadtime seconds\n";
my $count = () = $map =~ /abc/;
print "Found $count occurances\n";
}

[copy paste from the blog]
The $map acts just like a normal Perl string, and you don’t have to worry about any of the mmap details. When the variable goes out of scope, the map is broken and your program doesn’t suffer from a large chunk of unused memory.
In Tim Bray’s Wide Finder contest to find the fatest way to process log files with “wider” rather than “faster” processors, the winning solution was a Perl implementation using mmap (although using the older Sys-Mmap). Perl had nothing special in that regard because most of the top solutions used mmap to avoid the I/O penalty.
The mmap is especially handy when you have to do this with several files at the same time (or even sequentially if Perl needs to find a chunk of contiguous memory). Since you don’t have the data in real memory, you can mmap as many files as you like and work with them simultaneously.
Also, since the data actually live on the disk, different programs running at the same time can share the data, including seeing the changes each program makes (although you have to work out the normal concurrency issues yourself). That is, mmap is a way to share memory.
The File::Map module can do much more too. It allows you to lock filehandles, and you can also synchronize access from threads in the same process.
If you don’t actually need the data in your program, don’t ever load it: mmap it instead.
[/end copy paste]

No comments: