Memory-map files instead of slurping them
It uses the module File::Mapuse File::Map qw(map_file);
{
my $start = time;
map_file my $map, '/Volumes/Hercules/Red/revealing_it_all_big.mov';
my $loadtime = time - $start;
print "Loaded file in $loadtime seconds\n";
my $count = () = $map =~ /abc/;
print "Found $count occurances\n";
}
{
my $start = time;
map_file my $map, '/Volumes/Hercules/Red/revealing_it_all_big.mov';
my $loadtime = time - $start;
print "Loaded file in $loadtime seconds\n";
my $count = () = $map =~ /abc/;
print "Found $count occurances\n";
}
[copy paste from the blog]
The[/end copy paste]$map
acts just like a normal Perl string, and you don’t have to worry about any of themmap
details. When the variable goes out of scope, the map is broken and your program doesn’t suffer from a large chunk of unused memory.
In Tim Bray’s Wide Finder contest to find the fatest way to process log files with “wider” rather than “faster” processors, the winning solution was a Perl implementation usingmmap
(although using the older Sys-Mmap). Perl had nothing special in that regard because most of the top solutions usedmmap
to avoid the I/O penalty.
Themmap
is especially handy when you have to do this with several files at the same time (or even sequentially if Perl needs to find a chunk of contiguous memory). Since you don’t have the data in real memory, you can mmap as many files as you like and work with them simultaneously.
Also, since the data actually live on the disk, different programs running at the same time can share the data, including seeing the changes each program makes (although you have to work out the normal concurrency issues yourself). That is,mmap
is a way to share memory.
The File::Map module can do much more too. It allows you to lock filehandles, and you can also synchronize access from threads in the same process.
If you don’t actually need the data in your program, don’t ever load it:mmap
it instead.
No comments:
Post a Comment