Thursday 29 April 2010

How to concatenate files without the headers with perl

== Problem: ==
You have hundreds of files with a header and you want to concatenate
all of them without the headers except the first one

==Solution(s)==

=== Perl one-liner ===


$ ls
file.1 file.2 file.3 file.4 ...

# create a file with the header
# print only the first line of one of the files and
# redirect ('>') to the final file

$ head -n 1 file.1 > concatenated.file

# loop for all the files and print all lines except first one ($.==1)
# if your files have numeric suffixes and are correlative use `seq`
# if not use `find`, `ls | grep` etc ('find' is more secure than 'ls' [google for it])
## (be careful with `ls` if your filenames are not non-space or non-ascii)

### TCSH
$ foreach x ( `seq 1 10` )
foreach? echo $x
foreach? perl -lne 'print if $.>1' file.$x >> concatenated.file
foreach? end

### BASH
$ for x in ($(seq 1 10));do echo $x; \
perl -lne 'print if $.>1' file.$x >> concatenated.file; done

#or
$ echo {1..10}| xargs -n1 -t -I'{}' perl -lne 'print if $.>1' file.'{}' >> concat

## better to use xargs than a loop but is more difficult to have all in a one-liner
## because quotes problems when you need to do complicated things, or the redirection
## file needs also to use the loop variable.


=== Only non-perl commands (faster and shorter) ===


# you can use 'find ... -exec ...' or
# use tail -n+
$ head -n1 file.1 > concatenated.file
$ tail -q -n+2 file.* >> concatenated.file'

# the tail -q prevents to output the file name
# the tail -n+2 takes from second line to the end

# if the order of the numeric suffixes is important (the * expansion puts 10 before 2)
# you should rename the files
# and convert 1,2,...,10 to 01,02,..,10 with rename and "sprintf "%02d",$suff'
# or use a loop with the correct order of suffixes [ for x in $(seq 1 22)].

No comments: