Archive > March 2011

Downloading Facebook Albums with C#

» 30 March 2011 » In C#, Open-Source, Programming, Releases » 3 Comments

If you can remember, I already posted a script that does the same job (downloading facebook albums) written in Perl. Sadly, the script doesn’t work anymore, and because of the bad practices I have applied to it, I felt lazy updating. So instead, I created another project. This time, it’s on C#. So the project has a GUI, which most of the users will love.

I actually started this 20 days ago. I didn’t have the chance to work on this until yesterday night. What it does is pretty simple. It uses the Facebook Graph API, so there would be a great, great chance that this will not fail. So let us take a look at the process.

ADown Flowchart

ADown Flowchart

To be honest, this is the first time I used a flowchart on my projects. At first I was not convinced by this, but then I just realized this is one of the best ways to explain how a process takes action.

I also used an external JSON parser (since most of what I found are serializers). Credits goes here for the parser.

Here’s the screenshot of the GUI:

ADown Screenshot

ADown Screenshot

There are some notes I have to mention:

  • Downloading speed will depend on the connection speed
  • The download folder by default is in the Documents folder named ‘ADown’
  • Each picture will be named based on the Photo ID in the Graph API

And of course, the features:

  • Threaded
  • Verbose
  • Uses Facebook Graph API
  • Able to change download folder
  • Creates a folder for every album
  • Accepts the Album URL (easy copy-paste)

If you have no permission in that album (although it’s visible to you in Facebook) you won’t be able to access it in the Graph API. An example would be a public album of someone who’s not in your friend list.

The source code of course is freely available at github and released under BSD 3-clause license. Help me maintain it, by forking. :)

The executable file is also available, that is if you do not have a VS2010 installed. You need to install .NET Framework 4 first. Download the executable file here.

That’s all for this project, please leave comments if you have something in mind. Thank you.

Continue reading...

Tags: , , , , , , , ,

C# Basic HTTP Request Class

» 11 March 2011 » In C#, Programming » 7 Comments

Most of my clients, want me to do things with a Graphical User Interface. For this, I only have on language in mind, C#. And many of my client-related projects includes getting information from webpages, logging in, gathering data, etc. It’s very hard to repeat code over and over again with the HttpWebRequest and HttpWebResponse class. For this, I created a simple class that does GET and POST neatly and can handle cookies. Please do note that I didn’t reinvent the wheel on this one. This is just a collection of mostly used functions to make them reusable in many of my (and could be your) projects.

A simple HTTP GET request looks like this:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://ruel.me");
request.CookieContainer = cJar;
request.UserAgent = UserAgent;
request.KeepAlive = false;
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string response = sr.ReadToEnd();

And I think it’s fairly unacceptable to repeat this code over and over again in a single project. So yes, there should be a class made for this (together with the POST request).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;

namespace BasicReq
{
    /// <summary>
    /// A simple basic class for HTTP Requests.
    /// </summary>
    class BReq
    {
        /// <summary>
        /// UserAgent to be used on the requests
        /// </summary>
        public string UserAgent = @"Mozilla/5.0 (Windows; Windows NT 6.1) AppleWebKit/534.23 (KHTML, like Gecko) Chrome/11.0.686.3 Safari/534.23";
        
        /// <summary>
        /// Cookie Container that will handle all the cookies.
        /// </summary>
        private CookieContainer cJar;

        /// <summary>
        /// Performs a basic HTTP GET request.
        /// </summary>
        /// <param name="url">The URL of the request.</param>
        /// <returns>HTML Content of the response.</returns>
        public string HttpGet(string url)
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.CookieContainer = cJar;
            request.UserAgent = UserAgent;
            request.KeepAlive = false;
            request.Method = "GET";
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader sr = new StreamReader(response.GetResponseStream());
            return sr.ReadToEnd();
        }

        /// <summary>
        /// Performs a basic HTTP POST request
        /// </summary>
        /// <param name="url">The URL of the request.</param>
        /// <param name="post">POST Data to be passed.</param>
        /// <param name="refer">Referrer of the request</param>
        /// <returns>HTML Content of the response.</returns>
        public string HttpPost(string url, string post, string refer = "")
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.CookieContainer = cJar;
            request.UserAgent = UserAgent;
            request.KeepAlive = false;
            request.Method = "POST";
            request.Referer = refer;

            byte[] postBytes = Encoding.ASCII.GetBytes(post);
            request.ContentType = "application/x-www-form-urlencoded";
            request.ContentLength = postBytes.Length;

            Stream requestStream = request.GetRequestStream();
            requestStream.Write(postBytes, 0, postBytes.Length);
            requestStream.Close();
            
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader sr = new StreamReader(response.GetResponseStream());
            
            return sr.ReadToEnd();
        }

        /// <summary>
        /// Creates an HTML file from the string.
        /// </summary>
        /// <param name="html">HTML String.</param>
        public void DebugHtml(string html)
        {
            StreamWriter sw = new StreamWriter("debug.html");
            sw.Write(html);
            sw.Close();
        }

        /// <summary>
        /// Initializes a new instance of the <see cref="BReq"/> class.
        /// </summary>
        public BReq()
        {
            cJar = new CookieContainer();
        }

        /// <summary>
        /// Releases unmanaged resources and performs other cleanup operations before the
        /// <see cref="BReq"/> is reclaimed by garbage collection.
        /// </summary>
        ~BReq()
        {
            // Nothing here
        }
    }
}

Of course the UserAgent is a public variable, so you can change it anytime upon initialization. The cookie container is used only inside the class. Yes, that’s a major con, but you can freely modify it. And that means, you can only use this class on one function if you would like to retain the cookies.

There’s also a DebugHtml function that can help you log the last request (by passing the response as a parameter, outside the class).

That’s it for now, and I highly suggest we help each other improve this class by forking it. This way improvements and ideas can be added. The more the merrier! Thank you.

Forks

These people made wonderful forks!

Continue reading...

Tags: , , , , , , , ,

Scrape Your Facebook Friends’ Emails with Python

» 03 March 2011 » In Internet, Open-Source, Programming, Python » 34 Comments

This is an update of an earlier post about Facebook contact info scraping.

DISCLAIMER: This is against Facebook TOS . Use at your own risk.

It’s been so, so long since I posted something here. And if you missed me, I apologize for that. Well never mind the previous statement. This is an update of the Facebook contact info scraper in python. The old one stopped working when Facebook updated their User Interface. And I must tell you, that this is the greatest drawback of writing a scraper that relies on regular expressions.

Yes, using regular expressions with scrapers is pretty much a bad idea, but for tools like this, an exception must be made. Most programming languages nowadays do not include good enough HTML parsers. But why? Yes there are available libraries/modules, like Beautiful Soup in Python. It’s a powerful module, but for this script, it was way too powerful. Regular expressions however, is just right, in my opinion. As this script doesn’t require heavy parsing. Of course there will be fellow coders that will disagree with this paragraph, you’re very much welcome, and let me hear you on the comments. :)

What will be the changes for this one? Actually the script will be using a bit of the Graph API. Too bad it doesn’t provide email information about your friends. Actually it provides email information, but a special permission is required. We will be using the Graph API to get the complete list of our Facebook friends. Unlike the previous scraper, this one will be a lot faster, and more accurate on gathering friend information.

And of course, as you can see from the title, we will only be scraping the email addresses. As for the output, we will no longer use the elegant HTML/CSS report. Instead, we will generate a CSV file containing the name, and the email.

Here’s the python code:

#!/usr/bin/python

'''
	InFB - Information Facebook
	Usage: infb.py user@domain.tld password

http://ruel.me

	Copyright (c) 2011, Ruel Pagayon
	All rights reserved.

	Redistribution and use in source and binary forms, with or without
	modification, are permitted provided that the following conditions are met:
		* Redistributions of source code must retain the above copyright
		  notice, this list of conditions and the following disclaimer.
		* Redistributions in binary form must reproduce the above copyright
		  notice, this list of conditions and the following disclaimer in the
		  documentation and/or other materials provided with the distribution.
		* Neither the name of the author nor the names of its contributors 
		  may be used to endorse or promote products derived from this software 
		  without specific prior written permission.

	THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND ANY 
	EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 
	WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
	DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 
	INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
	LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, 
	OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
	LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE 
	OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 
	ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''

import sys
import re
import urllib
import urllib2
import cookielib
import csv
import json

def main():
	# Check the arguments
	if len(sys.argv) != 3:
		usage()
	user = sys.argv[1]
	passw = sys.argv[2]
	
	# Initialize the needed modules
	CHandler = urllib2.HTTPCookieProcessor(cookielib.CookieJar())
	browser = urllib2.build_opener(CHandler)
	browser.addheaders = [('User-agent', 'InFB - ruel@ruel.me - http://ruel.me')]
	urllib2.install_opener(browser)
	
	
	# Initialize the cookies and get the post_form_data
	print 'Initializing..'
	res = browser.open('http://m.facebook.com/index.php')
	mxt = re.search('name="post_form_id" value="(\w+)"', res.read())
	pfi = mxt.group(1)
	print 'Using PFI: %s' % pfi
	res.close()
	
	# Initialize the POST data
	data = urllib.urlencode({
		'lsd'				: '',
		'post_form_id'		: pfi,
		'charset_test' 		: urllib.unquote_plus('%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84'),
		'email'				: user,
		'pass'				: passw,
		'login'				: 'Login'
	})
	
	# Login to Facebook
	print 'Logging in to account ' + user
	res = browser.open('https://www.facebook.com/login.php?m=m&refsrc=http%3A%2F%2Fm.facebook.com%2Findex.php&refid=8', data)
	rcode = res.code
	if not re.search('Logout', res.read()):
		print 'Login Failed'
		
		# For Debugging (when failed login)
		fh = open('debug.html', 'w')
		fh.write(res.read())
		fh.close
		
		# Exit the execution :(
		exit(2)
	res.close()
	
	# Get Access Token
	res = browser.open('http://developers.facebook.com/docs/reference/api')
	conft = res.read()
	mat = re.search('access_token=(.*?)"', conft)
	acct = mat.group(1)
	print 'Using access token: %s' % acct
	
	# Get friend's ID
	res = browser.open('https://graph.facebook.com/me/friends?access_token=%s' % acct)
	fres = res.read()
	jdata = json.loads(fres)
	
	# Initialize the CSV writer
	fbwriter = csv.writer(open('%s.csv' % user, 'ab'), delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
	
	# God for each ID in the JSON response
	for acc in jdata['data']:
		fid = acc['id']
		fname = acc['name']
		
		# Go to ID's profile
		res = browser.open('http://m.facebook.com/profile.php?id=%s&v=info&refid=17' % fid)
		xma = re.search('mailto:(.*?)"', res.read())
		if xma:
			
			# Replace the html entity from the scraped information
			email = xma.group(1).replace('&#64;', '@')
			
			# In case there will be weird characters, repr() will help us.
			try:
				print fname, email
			except:
				print repr(fname), repr(email)
				
			# Write to CSV, again with repr() if something weird prints out.
			try:
				fbwriter.writerow([fname, email])
			except:
				fbwriter.writerow([repr(fname), repr(email)])
	
	
def usage():
	'''
		Usage: infb.py user@domain.tld password
	'''
	print 'Usage: ' + sys.argv[0] + ' user@domain.tld password'
	sys.exit(1)
	
if __name__ == '__main__':
	main()

I’ve also updated the gist, so you can fork it anytime here.

Yes, it’s tested and working (for now). In the future this will pretty much stop working again, and I’ll be updating it so do not worry. And again, use this at your own risk. If you have questions/comments regarding this script, don’t hesitate to comment below.

Best Regards.

Continue reading...

Tags: , , , , ,