Tag Archive > file

Identifying Duplicate Lines in a Text File

» 24 November 2010 » In Open-Source, Perl, Programming » 13 Comments

It was never easy checking if there are duplicate entries in our text files. Although there are simple methods like firing up notepad and try to find the whole line. But what if you need to identify line numbers?

Why and How?

Recently, I coded a duplicate line identifier in Perl. Actually I was planning to do that in Python instead, but for the sake of answering this question, I wrote it in Perl. It took me several minutes to get the general idea on how to completely answer that question, and I guess I just succeeded.

About the code, I really used that new style of mine I mentioned 2 blog posts away (maybe), and it worked well. I’m a bit worried about my variables though, they make me feel like I coded a mess. But still, it’s just me.

The code is pretty simple to understand, considering there are nested loops, I don’t recommend simulation. But for a 2 or 3 line file then go ahead. And what makes this different from others is, this identifies line numbers. Not removing them, or just printing them out. It’s a bit handy with, let’s say, debugging a text file. I don’t know if that exists but it’s probably the correct. Anyway, here’s the code.

The Code

#!/usr/bin/perl

#	This program is free software: you can redistribute it and/or modify
#	it under the terms of the GNU General Public License as published by
#	the Free Software Foundation, either version 3 of the License, or
#	(at your option) any later version.
#
#	This program is distributed in the hope that it will be useful,
#	but WITHOUT ANY WARRANTY; without even the implied warranty of
#	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#	GNU General Public License for more details.
#
#	You should have received a copy of the GNU General Public License
#	along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
#	Copyright (c) 2010 Ruel Pagayon <ruel@ruel.me> - http://ruel.me

use strict;
use warnings;

sub loadf($) {
    my @file = ( );
    open(FILE, '<', $_[0] ) or die("Couldn't Open " . $_[0] . "\n");
    @file = <FILE>;
    close(FILE);
    return @file;
}

{
	my @file = loadf("path-to-file.txt");
	my @inner = @file;
	my @dup = ( );
	my $l0 = 0; my $l1 = 0; my $l2 = 0; my $dc = 0;	my $tc;
	foreach my $line (@file) {
		$l1++;
		$line =~ s/^\s+//;
		$line =~ s/\s+$//;
		foreach my $iline (@inner) {
			$l2++;
			$iline =~ s/^\s+//;
			$iline =~ s/\s+$//;
			next if ($l1 == $l2 || grep { $_ eq $l1} @dup );
			if ($iline eq $line) {
				$dc++;
				if ($dc > 0) {
					if ($l0 == 0) {
						print "Line " . $l1 . ": " . $line . "\n";
						$l0++;
					}
					print "Line " . $l2 . ": " . $iline . "\n";
					push (@dup, $l2);
				}
			}
		}
		print "\n" unless($dc == 0);
		$dc = 0; $l0 = 0; $l2 = 0;
	}
}

__END__

Just in case you have suggestions about this code, or if you want to download it without copy-paste (silly), I posted this code to gist. But please do leave a comment, if you have something in mind for this code.

Continue reading...

Tags: , , , , , , , , ,

Google Docs: Best Free File Server

» 03 November 2010 » In Guides, Internet » 2 Comments

It’s very hard to find a reliable, fast, and free file hosting service today. Most have waiting download time, and even requires a premium or pro account to achieve satisfaction with the service. But then, there’s Google Docs. It’s not actually a file hosting service, but it can be one.

Google Docs is a free, Web-based word processor, spreadsheet, presentation, form, and data storage service offered by Google. It allows users to create and edit documents online while collaborating in real-time with other users. Google Docs combines the features of Writely and Spreadsheets with a presentation program incorporating technology designed by Tonic Systems.

Yes, just like the very basic version of Microsoft Office, only its online. For now, the file size upload limit is 1GB and the storage capacity is also 1GB, but what could you ask for more? If you need to backup or share important documents or files privately, then this is for you. I myself is using this service to share large image files. With this, you can take advantage of the reliability, security and speed.

Google Docs allows uploads of all extensions. And you have the ability to change the privacy settings of a particular file. You can share it publicly, exclusively, and privately.

  • Public on the Web - Anyone on the Internet can find and access. No sign-in required.
  • Anyone with the link - Anyone who has the link can access. No sign-in required.
  • Private - Only people explicitly granted permission can access. Sign-in required.

There are so many options to choose from, especially when you’re on a Google Apps domain account. And don’t forget, all of these is FREE! So spread the word.

Continue reading...

Tags: , , , , ,

loadf(); PERL Sub-function

» 26 July 2010 » In Perl » No Comments

I had this for years, and for a surprise, I’m still using this sub function. It’s simple and my code gets cleaner when I use this. I just want to share this to you, and you might find some use to this. Actually, I should be creating more sub-functions for repetitive tasks. I might come up with good useful ones like this. Heh. :p

Continue reading...

Tags: , , , ,