Tag Archive > perl

Identifying Duplicate Lines in a Text File

» 24 November 2010 » In Open-Source, Perl, Programming » 13 Comments

It was never easy checking if there are duplicate entries in our text files. Although there are simple methods like firing up notepad and try to find the whole line. But what if you need to identify line numbers?

Why and How?

Recently, I coded a duplicate line identifier in Perl. Actually I was planning to do that in Python instead, but for the sake of answering this question, I wrote it in Perl. It took me several minutes to get the general idea on how to completely answer that question, and I guess I just succeeded.

About the code, I really used that new style of mine I mentioned 2 blog posts away (maybe), and it worked well. I’m a bit worried about my variables though, they make me feel like I coded a mess. But still, it’s just me.

The code is pretty simple to understand, considering there are nested loops, I don’t recommend simulation. But for a 2 or 3 line file then go ahead. And what makes this different from others is, this identifies line numbers. Not removing them, or just printing them out. It’s a bit handy with, let’s say, debugging a text file. I don’t know if that exists but it’s probably the correct. Anyway, here’s the code.

The Code

#!/usr/bin/perl

#	This program is free software: you can redistribute it and/or modify
#	it under the terms of the GNU General Public License as published by
#	the Free Software Foundation, either version 3 of the License, or
#	(at your option) any later version.
#
#	This program is distributed in the hope that it will be useful,
#	but WITHOUT ANY WARRANTY; without even the implied warranty of
#	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#	GNU General Public License for more details.
#
#	You should have received a copy of the GNU General Public License
#	along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#	Copyright (c) 2010 Ruel Pagayon <ruel@ruel.me> - http://ruel.me

use strict;
use warnings;

sub loadf($) {
    my @file = ( );
    open(FILE, '<', $_[0] ) or die("Couldn't Open " . $_[0] . "\n");
    @file = <FILE>;
    close(FILE);
    return @file;
}

{
	my @file = loadf("path-to-file.txt");
	my @inner = @file;
	my @dup = ( );
	my $l0 = 0; my $l1 = 0; my $l2 = 0; my $dc = 0;	my $tc;
	foreach my $line (@file) {
		$l1++;
		$line =~ s/^\s+//;
		$line =~ s/\s+$//;
		foreach my $iline (@inner) {
			$l2++;
			$iline =~ s/^\s+//;
			$iline =~ s/\s+$//;
			next if ($l1 == $l2 || grep { $_ eq $l1} @dup );
			if ($iline eq $line) {
				$dc++;
				if ($dc > 0) {
					if ($l0 == 0) {
						print "Line " . $l1 . ": " . $line . "\n";
						$l0++;
					}
					print "Line " . $l2 . ": " . $iline . "\n";
					push (@dup, $l2);
				}
			}
		}
		print "\n" unless($dc == 0);
		$dc = 0; $l0 = 0; $l2 = 0;
	}
}

__END__

Just in case you have suggestions about this code, or if you want to download it without copy-paste (silly), I posted this code to gist. But please do leave a comment, if you have something in mind for this code.

Continue reading...

Tags: , , , , , , , , ,

Recursive Search and Replace with Perl

» 20 November 2010 » In Open-Source, Perl, Programming » No Comments

Few days ago, I was trying to search and replace a string in bunch of files nested in many subdirectories. Then I remembered, I can code. I wrote a script to do this for me, and I called it RName.

Actually I’m not planning to release the code. Few hours ago, I changed my mind. The code is now freely available at github. It takes 3 arguments, the string to search, the string to be replaced, and lastly the directory. It still needs a lot of changes and features, one of the reasons I released it. And to think of it, this is just a simple script. But when worked out, this could be one big project. I don’t assume anyway, we’ll see if other devs will fork it.

Coding it is simple, but when I decided to release it, I have to formalize. I separated the sub functions and placed it on a module. That’s the first time I did this by the way. I also refined my main script’s structure, which I will be using in my future scripts. Here’s what my basic skeleton script in Perl looks like:

#!/usr/bin/perl

use strict;
use warnings;

{
	# ... main code
}
__END__

I also included exts.txt which holds all the valid extensions that the script can edit. It’s currently been separated by newlines.

I wish to develop more scripts, and hopefully my range will grow from simple projects to complex ones. That’s it for now, and just leave comments here if you have something to say.

Continue reading...

Tags: , , , ,

Perl: Trimming a String

» 16 November 2010 » In Perl, Programming » 2 Comments

This is a quick tip for those perl coders who have trouble trimming whitespace in a string. By trimming, I mean removing trailing and/or leading whitespace characters. And whitespace characters includes a space, newline, tab, etc. It’s really not that hard, just two lines of regular expression.

Back in 2007, when I was learning Perl, I looked into a source code of a friend. I forgot his handle, so I can’t thank him fully. Anyway, I was copying parts of his code back then to create a login checker of a site I also forgot. Well, it was smooth and everything until I encountered two weird lines. Of course I was learning Perl, so I do not have any idea what regular expressions are back then. These are those two lines:

$line =~ s/^\s+//;
$line =~ s/\s+$//;

Those lines above trims a string. It removes trailing and leading whitespace characters. If you know regular expressions, you probably knew what those lines mean already. In fact, I still use those until now (if you do not believe me, check my my older posts in this blog).

Some of the scenarios you would probably use trimming is when reading a file line by line. Suppose you already loaded a file in an array named @file.

foreach my $line (@file) {
	$line =~ s/^\s+//;
	$line =~ s/\s+$//;
        ...
}

Without those trimming, there’s a high chance that your string will end in a newline. That’s because when you loaded the file into the array, the newline is included in each line. In some encoding, the newline is placed before the string, so you need to make sure that there’s no whitespace character before it.

No most of you probably hate it when you need to type those over and over again. Well for me it’s fun, I don’t know but I’m just used to typing those lines. Heh. Anyway, you place it in a sub-function so you do not have to worry.

sub fulltrim($) {
	my $string = $_[0];
	$string =~ s/^\s+//;
	$string =~ s/\s+$//;
	return $string;
}
sub ltrim($) {
	my $string = $_[0];
	$string =~ s/^\s+//;
	return $string;
}
sub rtrim($) {
	my $string = $_[0];
	$string =~ s/\s+$//;
	return $string;
}

And there you have it. You can reuse the sub-functions without typing regular expressions. Happy coding!

Continue reading...

Tags: , , , ,

Facebook Album Downloader

» 08 September 2010 » In Perl, Programming » 3 Comments

Last July, I posted the Multiply Album Downloader. To be honest, I didn’t know that I posted that here. All I know is I posted that script on my old blog. I just discovered that few minutes ago. Anyway, I coded this script last week, because I need to download a lot of pictures from my friends’ Facebook albums.

Continue reading...

Tags: , , ,

loadf(); PERL Sub-function

» 26 July 2010 » In Perl » No Comments

I had this for years, and for a surprise, I’m still using this sub function. It’s simple and my code gets cleaner when I use this. I just want to share this to you, and you might find some use to this. Actually, I should be creating more sub-functions for repetitive tasks. I might come up with good useful ones like this. Heh. :p

Continue reading...

Tags: , , , ,

Socks Proxy on Perl

» 22 July 2010 » In Perl » No Comments

Just in case your script needs to transfer/deliver data across the web securely, you might want to add SOCKS support from it.

Continue reading...

Tags: , , , , ,