Scrape Your Facebook Friends’ Contact Info with Python

» 26 November 2010 » In Internet, Open-Source, Python »

UPDATE: This post has been updated as the code here is no longer working. You can find the updated post here.

I coded this script in Perl almost a month ago. But then I’m thinking of learning Python, so I re-coded this to Python. Basically this script demonstrates scraping, web crawling, and cookies. And of course, this is free and and forkable at Gist. I named the script InFB, my short version for Facebook and Information.

The output of this script, is the profile ID, profile pame, profile URL, e-mail address and mobile/phone number (if provided by friend). One thing to remember though, don’t expect this script to scrape those addresses/numbers which are hidden by your friend. This only extracts data based on what you can exactly see on his/her profile.

For easier page access and scraping, I used the mobile version of Facebook. It’s much lighter, and clearer. Besides, I can’t find a way to generate the friend list on the full site. Heh.

Usage

Using this script is easy, all you need is to load-up terminal or windows command prompt, and pass your e-mail address and password as the arguments.

infb.py user@domain.tld password

You can also put this on a batch file or shell script (if you have multiple accounts).

Code

UPDATE:, here’s the updated code (forked by gelendir). Using an HTML Parser, a better way to get data. This code can be found here. And the original one, can still be found here.

#!/usr/bin/python
#
#	InFB - Information Facebook
#	Usage: infb.py user@domain.tld password
#	http://ruel.me
#
#	Copyright (c) 2010, Ruel Pagayon - ruel@ruel.me
#	All rights reserved.
#
#	Redistribution and use in source and binary forms, with or without
#		* Redistributions of source code must retain the above copyright
#		  notice, this list of conditions and the following disclaimer.
#		* Redistributions in binary form must reproduce the above copyright
#		  notice, this list of conditions and the following disclaimer in the
#		  documentation and/or other materials provided with the distribution.
#		* Neither the name of ruel.me nor the names of its contributors
#		  may be used to endorse or promote products derived from this
#		  script without specific prior written permission.
#
#	THIS SCRIPT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
#	ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
#	WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
#	DISCLAIMED. IN NO EVENT SHALL RUEL PAGAYON BE LIABLE FOR ANY
#	DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
#	(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
#	LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
#	ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
#	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
#	SCRIPT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


import sys, re, urllib, urllib2, cookielib, HTMLParser, getpass

class FormScraper(HTMLParser.HTMLParser):
    """
    Scrapes the Facebook login page for form values that need to be submitted on login.
    Necessary because the form values change each time the login page is loaded.

    Usage:
    form_scraper = FormScraper()
    form_scraper.feed(html_from_facebook)
    form_values = form_scraper.values
    """

    def __init__(self, *args, **kwargs):
        HTMLParser.HTMLParser.__init__(self, *args, **kwargs)
        self.in_form = False
        self.values = []

    def handle_starttag(self, tag, attrs):
        tag = tag.lower()
        attrs = dict(attrs)

        if tag == 'form' and attrs['id'] == 'login_form':
            self.in_form = True
        elif self.in_form and tag == 'input' and attrs['type'] == 'hidden':
            self.values.append( (attrs['name'], attrs['value']) )

    def handle_endtag(self, tag):
        if tag.lower() == 'form' and self.in_form:
            self.in_form = False

def main():
    if len(sys.argv) < 2:
        usage()
    
    user = sys.argv[1]

    if len(sys.argv) < 3:
        passw = getpass.getpass("Enter password: ")
    else:
        passw = sys.argv[2]

    # Set needed modules
    CHandler = urllib2.HTTPCookieProcessor(cookielib.CookieJar())
    browser = urllib2.build_opener(CHandler)
    browser.addheaders = [('User-agent', 'InFB - ruel@ruel.me - http://ruel.me')]
    urllib2.install_opener(browser)

    #Retrieve login form data and initialize the cookies
    print 'Initializing..'
    res = browser.open('https://www.facebook.com/login.php')

    #Determine string encoding
    content_type = res.info()['Content-Type'].split('; ')
    encoding = 'utf-8'
    if len(content_type) > 1 and content_type[1].startswith('charset'):
        encoding = content_type[1].split('=')[1]
    html = unicode( res.read(), encoding=encoding )
    res.close()

    #scrape form for hidden inputs, add email and password to values
    form_scraper = FormScraper()
    form_scraper.feed(html)
    form_data = form_scraper.values
    form_data.extend( [('email', user), ('pass', passw)] )
    #HACK: urlencode doesn't like strings that aren't encoded with the 'encode' function.
    #Using html.encode(encoding) doesn't help either. why ??
    form_data = [ ( x.encode(encoding), y.encode(encoding) ) for x,y in form_data ]
    data = urllib.urlencode(form_data)

    # Login
    print 'Logging in to account ' + user
    res = browser.open('https://login.facebook.com/login.php?login_attempt=1', data)
    rcode = res.code
    print rcode
    print res.url
    if not re.search('home\.php$', res.url):
        print 'Login Failed'
        exit(2)
    res.close()

    # Get Emails and Phone Numbers
    print "Getting Info..\n"
    flog = open(user + '.html', 'a')
    flog.write("<html>\n\t<head>\n\t\t<title>InFB - " + user + "</title>\n\t\t<link href=\"infb.css\" rel=\"stylesheet\" type=\"text/css\" />\n\t</head>\n\t<body>\n\t\t<div class=\"rby\">\n\t\t\t<table class=\"flist\">\n\t\t\t\t")
    page = 0
    while True:
        res = browser.open('http://m.facebook.com/friends.php?a&f=' + str(page))
        parp = res.read()
        m = re.findall('"\/friends\.php\?id=([0-9]+)&', parp)
        res.close()
        for i in m:
            prof = 'http://m.facebook.com/profile.php?id=' + i + '&v=info'
            res = browser.open(prof)
            cont = res.read()
            res.close()
            prof = prof.replace('m.', 'www.')
            ms = re.search('<div id="body"><div><div>(.*?)<\/div>', cont)
            if ms:
                name = ms.group(1)
            else:
                continue
            ms = re.search('href="tel:(.*?)"', cont)
            if ms:
                tel = ms.group(1)
            else:
                tel = ''
            ms = re.search('Emails?:<\/div><\/td><td valign="top"><div>(.*?)<\/div>', cont)
            if ms:
                email = re.sub('<br \/>', ', ', ms.group(1)).replace('&#64;', '@')
            else:
                continue
            print name + ' : ' + email + ' ' + tel
            flog.write("<tr class=\"lbreak\">\n\t\t\t\t\t<td class=\"num\">" + i + "</td><td class=\"fname\"><a href=\"" + prof + "\" title=\"" + name + "\">" + name + "</a></td><td class=\"fmail\">" + email + "</td></td><td class=\"cnum\">" + tel + "</td>\n\t\t\t\t\t</tr>\n\t\t\t\t")
        if re.search('Next', parp):
            page += 10
        else:
            break
    flog.write("\n\t\t\t</table>\n\t\t</div>\n\t</body>\n</html>")
    flog.close()

def usage():
    print 'Usage: ' + sys.argv[0] + ' user@domain.tld [password]'
    sys.exit(1)

if __name__ == '__main__':
    main()

CSS

This generates an HTML log file, and of course, beautified by CSS so you can customize the output (name it infb.css):

/*
	Ruel Pagayon (c) 2010 - ruel@ruel.me
	
	Cascading Style Sheet for InFB Log Output.
*/
body {
	background-color: #3C3C3C;
	color: #FFF;
	margin-top: 50px;
	margin-left: 25px;
	font-size: xx-small;
	font-family: Calibri, Arial, sans;
}
.rby {
	text-align: center;
	font-size: xx-small;
}

table  {
	text-align: center;
}

td {
	padding-top: 0.5em;
	padding-bottom: 0.5em;
	padding-left: 1em;
	padding-right: 1em;
	text-align: left;
	font-size: small;
}

td.num {
	color: #CCC;
}

td.cnum {
	color: #AFAFAF;
}

a:active, a:visited, a:link  {
	color: #FFF;
	font-weight: bold;
	text-decoration: none;
}

a:hover {
	color: #FFF;
	font-weight: bold;
	text-decoration: underline;
}

Again, if you want to suggest changes you can apply it directly by forking it on Gist ( or fork gelendir’s version) or simply drop a comment below. Thank you.

Disclaimer

This is against Facebook TOS. Use at your own risk.

Tags: , , , , ,

Trackback URL

  • Pingback: Facebook Messages: combine all of your inboxes and IM clients into one, unlimited storage service | Best Flight Simulator

  • Pingback: Tweets that mention Scrape Your Facebook Friends' Contact Info with Python -- Topsy.com

  • Pingback: World Spinner

  • http://pulse.yahoo.com/_HHA55IHLCY6TSGH2UG4TX3WDJ4 Jerry

    Looks interesting. How do I download this to try it ?
    I don’t know Gist ?

    • http://ruel.me Ruel

      Hello Jerry, you can just copy paste the code, save it as infb.py alongside with it, copy and paste the CSS to infb.css..

      For the script to run, you need to download the Python interpreter. http://www.activestate.com/activepython

      If you have any problems, just reply to this comment. :)

      • http://pulse.yahoo.com/_HHA55IHLCY6TSGH2UG4TX3WDJ4 Jerry

        Hi, Yes I understand I can just copy paste, I did that but I get this type of error:

        ./facebook_scrape.py
        File “./facebook_scrape.py”, line 6
        ����if len(sys.argv) != 3:
        ^
        SyntaxError: invalid syntax

        So I’d like to get the raw ASCII source.

        I know I could simply remove non ASCII characters with Perl or Python etc. But I thought you may have a Download area that is all.

        Cheers,

        Jerry

        • http://ruel.me Ruel

          I see, you can download that here: https://gist.github.com/gists/716622/download

          • http://pulse.yahoo.com/_HHA55IHLCY6TSGH2UG4TX3WDJ4 Jerry

            HI Ruel,

            Works like a charm!

            Really cool ;-)

            Thanks!

            Jerry

          • http://ruel.me Ruel

            Glad you had it working. :) Enjoy.

          • Oscar Gim

            Hi i upload the files: infb.css and infb.py to my server and then via putty execute it typing:

            python infb.py myuser@mydomain.com myfacebookpassword

            and i get this:

            Initializing..
            Logging in to account myuser@mydomain.com
            Login Failed
            Traceback (most recent call last):
            File “infb.py”, line 113, in ?

            dont know what can be wrong :/
            main()
            File “infb.py”, line 65, in main
            exit(2)
            TypeError: ‘str’ object is not callable

          • nour

            I have the message , I don’t now how this program work

            Thank you for your help
            /Bureau/python/craw$ python craw.py
            Usage: craw.py user@domain.tld [password]

  • http://twitter.com/_Kingster_ Andrew King

    What version of python is this? must be pre-3, as urllib2 doesn’t exist in 3… Certainly doesn’t run in 3.1.2, the latest stable.

    • http://ruel.me Ruel

      Actually, Python 2.7 :)

      I won’t use Python 3 until major packages will be ported to it. And OS distros will include it in the default installation. 2.7 is working just fine. :)

  • http://pulse.yahoo.com/_HHA55IHLCY6TSGH2UG4TX3WDJ4 Jerry

    BTW, now you could write an article on how to fork Gist for those of us who do not use Gist or Git ;-)

    • http://ruel.me Ruel

      Sure thing, well actually I’m also new to those, heh. But I will write if I have something to share about it.

  • http://kevin.ryan.myopenid.com/ ktr

    Not that it matters, but you might also consider using ‘getpass’ instead of using 2nd param as password (e.g., import getpass; passw = getpass.getpass(“Password: “)). Thx for the script!

    • http://ruel.me Ruel

      Thanks for your suggestion. Gelendir added that one. :)

      • gelendir

        I actually added that after reading ktr’s comment XD. I didn’t even know that a getpass module existed in python. Neat !

  • gelendir

    Nice idea ! I don’t know where you live, but from Canada I had string encoding problems and login errors when using the script. I forked your gist and added a few tweaks of my own, and hey, it works now : https://gist.github.com/718149

    • http://ruel.me Ruel

      Alright, thanks and glad it you improved it. :)
      I’ll update the source here.

  • http://www.saktidwicahyono.name Sakti Dwi Cahyono

    I only get 9 contacts from my friends list

    • http://ruel.me Ruel

      What do you see on this page? http://m.facebook.com/friends.php?a&f=0 How many contacts you have all in all?

      • http://www.saktidwicahyono.name Sakti Dwi Cahyono

        10 per page. there are 42 pages

        • http://ruel.me Ruel

          It’s probably because of your facebook language. It’s currently working on english only. So I assume, you have a different language. :)

          For a temporary fix, change the word ‘Next’ to the equivalent word in your language, in line 145.

  • Pingback: MagicOfWord – Die Zitate und Sprüche Community » Blog Archive » Wiederherstellen / Crack MS Word Passw?r

  • Pingback: RealTime - Questions: "I cant log into facebook?"

  • Pingback: Scrape Your Facebook Friends' Contact Info with Python

  • Ted

    Hi there! Great piece of code! Do you think that it would be possible to crawl all members of a group you are an admin?

    • http://ruel.me Ruel

      Yes, that’s possible!

      • Ted

        I am trying to implement this, but I haven’t figured out yet how to do it. The scraping should start from “http://www.facebook.com/groups/edit.php?edit_members&gid=xxxx” but the page is structured differently than friends.php.. Have you tried it?

        • http://ruel.me Ruel

          It won’t work for the script above, but the idea/concept itself is possible. You have to modify the regular expression, and you’re done.

          • Ted

            I know I will need to modify the regular expression, but I think the group members page dont follow the same logic we find in friends.php with a number of friends per static page. The group list is kinda dynamic. or am I wrong?

          • http://ruel.me Ruel

            All you need to fetch actually is the profile ID of the group members, then the rest works as the above script.

          • Ted

            it works.. but! hm.. It seems you cannot get more that 6,000 members from a group. Facebook is not listing more than that.. sh*t!
            it will stop at http://www.facebook.com/social_graph.php?node_id=xxx&start=5990

  • Guest

    is the script not working anymore???

    • http://ruel.me Ruel

      It’s still working. :)

  • Pingback: blackfriday fm

  • Wiz Ziw61

    How simple would it be to get just a list of my own friends, and then the number of friends each of them have? I guess a count of the number of friends we have in common would be nice. I just want to remove friends that are not really active… nothing sneaky! Sorry to be such a noob, but I don’t want to experiment & get booted! THANKS!

    • http://ruel.me Ruel

      Counting mutual friends will be easy. But removing them, that I haven’t tested yet. It might catch Facebook’s attention and investigate your account.

      • Wiz Ziw61

        My bad, I will manually remove them — I just want a list to work from.

  • User222

    Facebook has changed the “http://m.facebook.com/friends.php?a&f=’” page… not possible to crawl anymore.. :S

    • http://ruel.me Ruel

      Yeah, but oh well.. I’ll update it later. :D

      • User222

        hah! lets try to figure how to fix this! I can offer some help if possible.. this change has ruined my project ;)

        • sabiru

          hi, did you succeed in resolving this problem?

  • Bloodseecar

    i want to know my friends facebook account email ID and he hide this ID from me but i still want to find can any one help me please ?

  • http://www.facebook.com/saifuddin.slatewala Saifuddin Shabbir Slatewala

    is this still working it login in but no contacts are extracted

  • luca

    how to escape the ‘>’ character in the password?

  • Gotaquestion

    how do you successfully utilize this script without being flagged for abusive behavior is there a timing feature? i remember a similar program before that did this, have you considered a gui version?

  • http://www.facebook.com/profile.php?id=28118280 Alfred Inacio

    Does this still work?

  • Pingback: Scrape Your Facebook Friends’ Emails with Python

  • Idonthaveanemail

    This script works great, but have a problem if the ”
    Login Notifications:When an unrecognized computer or device tries to access my account: send me an email” is checked.I made a little patch for my case, but you might want to consider fix it.

  • Gangstar0

     Initializing..
    Logging in to account *******@live.com
    200
    http://www.facebook.com/home.php
    Getting Info..

    this is wot i get any help