PHP5 and Curl (Filters ReTweets)">Twitter Scan class using PHP5 and Curl (Filters ReTweets)

The other day I star­ted work­ing on a Twit­ter pro­ject and what I need to do is to scan Twit­ters pub­lic timeline and get a cer­tain amount of tweets from a cer­tain user. On top of that it needs to be filtered by hashtags.

After brows­ing the net for a little while I found a fant­astic class on script­play­ground.

I imple­men­ted it in my pro­ject and it worked like a charm.… until I wanted to fil­ter out ReTweets. I simply couldn’t find any addi­tional para­meter for the Twit­ter Search API that could do that for me. I mean I tried to put –RT and all that sort of stuff into the search-call but it simply didn’t work out for me.

Maybe I did some­thing wrong. If so, I would highly appre­ci­ate a com­ment that will widen my horizon!

So I decided to extend the class myself and provide it for any­one who might find it useful.

So here’s the class (Twitter.php) with some com­ments and my modifications.

<?php class Twitter
{
private $tweets = array();
private $user;
private $limit;
private $original_limit;
private $hashtag;
private $retweet_array = array();
private $old_number_of_retweets; 	   	   	  

public function __construct($user, $limit = 5, $hashtag)
{
$this->user = $user;
$this->limit = $limit;
$this->original_limit = $limit;
$this->hashtag = $hashtag;
$this->retrieve_tweets();
}

private function retrieve_tweets()
{
$this->user = str_replace(' OR ', '%20OR%20', $this->user);
$feed = curl_init('http://search.twitter.com/search.atom?q=from:'. $this->user .'&rpp='. $this->limit.'&tag='. $this->hashtag);
curl_setopt($feed, CURLOPT_RETURNTRANSFER, true);
curl_setopt($feed, CURLOPT_HEADER, 0);
curl_setopt($feed, CURLOPT_TIMEOUT, 6);
$xml = curl_exec($feed);
curl_close($feed);

$result = new SimpleXMLElement($xml);
foreach($result->entry as $entry)
{
$tweet = new stdClass();
$tweet->id = (string) $entry->id;
$user = explode(' ', $entry->author->name);
$tweet->user = (string) $user[0];
$tweet->author = (string) substr($entry->author->name, strlen($user[0])+2, -1);
$tweet->title = (string) $entry->title;
$tweet->content = (string) $entry->content;
$tweet->updated = (int) strtotime($entry->updated);
$tweet->permalink = (string) $entry->link[0]->attributes()->href;
$tweet->avatar = (string) $entry->link[1]->attributes()->href;

// Check if it is a retweet
$isRetweet = strpos($tweet->title, "RT ");

if( $isRetweet !== false )
{
// It is a ReTweet!
// Write the title in the ReTweet array for later checking
array_push( $this->retweet_array, $tweet->title );
}
else
{
array_push($this->tweets, $tweet);
}
}

// Now check how many entries are in the retweet_array
$number_of_retweets = count($this->retweet_array);

// We are done when the tweets_array has the desired amount of entries OR if we couldn't find new retweets
if ( count($this->tweets) == $this->original_limit || $this->old_number_of_retweets == $number_of_retweets )
{

// That's it!

// Now we have an array that does not contain ReTweets

// But we may have more than we wanted because in the last run it might have found more than the difference between the number of found tweets (count($this->tweets)) and the original_limit
// So check for that and remove the last ones if neccessary

// Differen between the original limit and the number of "valid" tweets that actually made it into the array
$difference = count($this->tweets) - $this->original_limit;

if ( $difference > 0 )
{
for ($i = 0; $i < $difference; $i++)
{
array_pop( $this->tweets );
}
}

// Do additional filtering here if you need to...
}
else
{
$this->limit += $number_of_retweets;
$this->tweets = array();
$this->retweet_array = array();
$this->old_number_of_retweets = $number_of_retweets;
$this->retrieve_tweets();
}

unset($feed, $xml, $result, $tweet);
}

public function getTweets()
{
return $this->tweets;
}

}
?>

It should be pretty self explan­at­ory. If it isn’t let me know and I might be able to help you out with any­thing that’s not so clear.

The one thing that I want to men­tion at this point is that you can enter an arbit­rary num­ber for $limit and it will work! So if your sys­tem will be dynamic later on and a user tries to get 136 tweets although only 3 are avail­able it won’t crash.

Here a little example on how to use it:

$twitter_username_string = "DanTheMan_nz OR traveldudes OR somemore";
$twitter_number_of_tweets = 5;
$twitter_hashtags = "travel+OR+newzealand+OR+news";

// Instantiate a Twitter object which will scan Twitter depending on the parameters
$twitter = new Twitter($twitter_username_string, $twitter_number_of_tweets, $twitter_hashtags);
$tweets = $twitter->getTweets();

foreach ( $tweets as $tweet )
{
echo "Title: " . $tweet->title;
}

Of course you have access to more details about each Tweet. Just take a second look at the Twitter.php class and you can see we have:

$tweet->id
$tweet->user
$tweet->author
$tweet->title
$tweet->content
$tweet->updated
$tweet->permalink
$tweet->avatar

That should get you somewhere!

 

I hope someone finds it use­ful and thanks for reading.

  1. May 13th, 2011 at 16:57 | #1

    Update: I for­got 3 lines of code in the last if clause. So in the end where we check if we’ve reached the desired amount of tweets we need to do the following:

    if ( count($this->tweets) >= $this->original_limit || $this->old_number_of_retweets == $number_of_retweets )
    {
    // That’s it!

    // Now we have an array that does not con­tain ReTweets

    // But we may have more than we wanted because in the last run it might have found more than the dif­fer­ence between the num­ber of found tweets (count($this->tweets)) and the original_limit
    // So check for that and remove the last ones if neccessary

    // Dif­feren between the ori­ginal limit and the num­ber of “valid” tweets that actu­ally made it into the array
    $dif­fer­ence = count($this->tweets) — $this->original_limit;

    if ( $dif­fer­ence > 0 )
    {
    for ($i = 0; $i < $dif­fer­ence; $i++)
    {
    array_pop( $this->tweets );
    }
    }

    // Do addi­tional fil­ter­ing here if you need to…
    }
    else
    {


    }
    }

    I updated the code already so it should work just fine now.

  1. No trackbacks yet.