Quick and Dirty Method to Search for all Relevant New York Times Article Comments and Classify Text. First I want to filter comments to only those articles that were the most popular articles for that week.
def emailed_results(data):
for i, document in enumerate(data):
return data['results']
def parsed_mailed(data):
mailed = []
for b in data:
dic = {}
dic['sub-title'] = b['abstract']
dic['byline'] = b['byline']
dic['column'] = b['column']
dic['type'] = b['des_facet']
dic['date'] =b['published_date']
dic['section'] = b['section']
dic['title'] = b['title']
dic['url'] = b['url']
mailed.append(dic)
return mailed
def title(data):
for i, title in enumerate(d['title'] for d in data):
print i,title
def url(data):
for i, url in enumerate(d['url'] for d in data):
print i,url
def most_mailed(days, api):
import urllib
import json
bucket = 'http://api.nytimes.com/svc/mostpopular/v2/mostemailed/all-sections/'
string = bucket+days+api
response_string = urllib.urlopen(string).read()
response_dictionary = json.loads(response_string)
results = emailed_results(response_dictionary)
parsed_results = parsed_mailed(results)
titles = title(parsed_results)
urls = url(parsed_results)
return titles, urls
#past seven days
days='7?'
api='api-key=##########'
#call the function
most_mailed(days, api)
Title | url |
---|---|
To Lose Weight, Eating Less Is Far More Important Than Exercising More | |
How to Pick a Cellphone Plan for Traveling Abroad | |
Naomi Oreskes, a Lightning Rod in a Changing Climate | |
How to Make Online Dating Work | |
America’s Seniors Find Middle-Class ‘Sweet Spot’ | |
Experts on Aging, Dying as They Lived | |
What It’s Like as a ‘Girl’ in the Lab | |
Three Simple Rules for Eating Seafood | |
Pope Francis, in Sweeping Encyclical, Calls for Swift Action on Climate Change | |
Stop Revering Magna Carta | |
Review: Pixar’s ‘Inside Out’ Finds the Joy in Sadness, and Vice Versa | |
Cardinals Investigated for Hacking Into Astros’ Database | |
In Tucson, an Unsung Architectural Oasis | |
Magna Carta, Still Posing a Challenge at 800 | |
Democrats Being Democrats | |
Black Like Who? Rachel Dolezal’s Harmful Masquerade | |
A Sea Change in Treating Heart Attacks | |
In ‘Game of Thrones’ Finale, a Breakdown in Storytelling | |
My Choice for President? None of the Above | |
The Family Dog |
df = []
for b in articles:
initial_df = nytimes(b)
df = df + initial_df
print 'Processing ' + str(b) + '...'
Processing http://www.nytimes.com/2015/06/16/upshot/to-lose-weight-eating-less-is-far-more-important-than-exercising-more.html...
Processing http://www.nytimes.com/2015/06/21/travel/how-to-pick-a-cellphone-plan-for-traveling-abroad.html...
Processing http://www.nytimes.com/2015/06/16/science/naomi-oreskes-a-lightning-rod-in-a-changing-climate.html...
Processing http://www.nytimes.com/2015/06/14/opinion/sunday/how-to-make-online-dating-work.html...
Processing http://www.nytimes.com/2015/06/15/business/economy/american-seniors-enjoy-the-middle-class-life.html...
Processing http://opinionator.blogs.nytimes.com/2015/06/19/depressed-try-therapy-without-the-therapist/...
Processing http://opinionator.blogs.nytimes.com/2015/06/17/experts-on-aging-dying-as-they-lived/...
Processing http://www.nytimes.com/2015/06/21/health/saving-heart-attack-victims-stat.html...
Processing http://www.nytimes.com/2015/06/18/opinion/what-its-like-as-a-girl-in-the-lab.html...
Processing http://www.nytimes.com/2015/06/19/world/europe/pope-francis-in-sweeping-encyclical-calls-for-swift-action-on-climate-change.html...
Processing http://www.nytimes.com/2015/06/14/opinion/three-simple-rules-for-eating-seafood.html...
Processing http://www.nytimes.com/2015/06/15/opinion/stop-revering-magna-carta.html...
Processing http://www.nytimes.com/2015/06/19/movies/review-pixars-inside-out-finds-the-joy-in-sadness-and-vice-versa.html...
Processing http://www.nytimes.com/2015/06/21/opinion/sunday/is-your-boss-mean.html...
Processing http://www.nytimes.com/2015/06/17/sports/baseball/st-louis-cardinals-hack-astros-fbi.html...
Processing http://www.nytimes.com/2015/06/14/travel/in-tucson-an-unsung-architectural-oasis.html...
Processing http://www.nytimes.com/2015/06/15/world/europe/magna-carta-still-posing-a-challenge-at-800.html...
Processing http://www.nytimes.com/2015/06/15/opinion/paul-krugman-democrats-being-democrats.html...
Processing http://www.nytimes.com/2015/06/16/opinion/rachel-dolezals-harmful-masquerade.html...
Processing http://www.nytimes.com/2015/06/16/arts/television/in-game-of-thrones-finale-a-breakdown-in-storytelling.html...
We have collected all the comments from all the most-emailed articles of the past seven days
len(df)
7092
df[0]
{'comment': u'The only way to lose weight permanently and arrive at a normal weight (117 pounds for me) is to create a calorie deficit and then a calorie equilibrium once that goal is achieved.
Thirty-five hundred calories eaten but not burned is equal to one pound of fat gained and 3,500 calories burned, but not consumed is equal to one pound of fat lost. Obviously eating 3,500 calories is a lot quicker than burning them. It would take 35 miles of walking to do that.
Fifteen years ago I was obese and about 70 pounds heavier, but I have maintained a normal weight since losing my excess weight in the year 2,000.
After studying the long-term research results, I started free weight loss groups of people who were suffering from my problem. We logged our daily calories, exercise and weight with each other. I still do it and you can join me if you are motivated and have less than 100 pounds to lose. You can see our results and my before and after pictures on www.permanentweightloss.org.
Let me know if this interests you by emailing me at russellk100@gmail.com. See you lighter soon?',
'comment_type': u'comment',
'date': u'1434435437',
'editorsSelection': 0,
'email': u'russellk100@gmail.com',
'location': u'New York City',
'login': None,
'name': u'Roberta Russell',
'recommend': 23,
'replies': [],
'update_date': u'1434435486'}
I am kind of curious about recommended comments and those comments selected by the editor
import numpy as np
np.mean([b['editorsSelection'] for b in df])
0.025803722504230117
np.mean([b['recommend'] for b in df])
13.1056119571348
time=np.asarray([b['date'] for b in df]).astype(float)
recommendations=np.asarray([b['recommend'] for b in df]).astype(float)
import matplotlib.pyplot as plt
import seaborn
model = zip( time, recommendations)
model.sort()
sliding = []
windowSize = 100
tSum = sum([x[0] for x in model[:windowSize]])
rSum = sum([x[1] for x in model[:windowSize]])
for i in range(windowSize,len(model)-1):
tSum += model[i][0] - model[i-windowSize][0]
rSum += model[i][1] - model[i-windowSize][1]
sliding.append((tSum*1.0/windowSize,rSum*1.0/windowSize))
X = [x[0] for x in sliding]
Y = [x[1] for x in sliding]
plt.plot(X, Y)
plt.title("Recommendations over Time", fontsize=15)
from textblob import TextBlob
text=[b['comment'] for b in df]
text=[word.strip("!\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~").lower() for word in text]
blobs=[]
for b in text:
blobs.append(TextBlob(b).sentiment.polarity)
blobs[:10]
[0.03712121212121212,
0.048571428571428585,
0.2380952380952381,
0.08571428571428573,
0.09743589743589742,
-0.8,
0.09166666666666667,
0.25,
0.08240740740740739,
0.09757142857142857]
While there is a Python wrapper for the New York Times Article Search API, getting article comments necessitates the use of the New York Time API directly. Compared to the Article Search wrapper, there are additional differences in parsing the results. In this example we are querying for the term 'misconduct' in an article's body, byline, and headline and filter this through those articles that contain Facebook in the headline and where the source is either Reuters, The AP, or the New York Times.
from nytimesarticle import articleAPI
api=articleAPI('API-key')
articles = api.search( q = 'misconduct',
fq = {'headline':'Facebook', 'source':['Reuters','AP', 'The New York Times']},
begin_date = 20140101 )
articles
{u'copyright': u'Copyright (c) 2013 The New York Times Company. All Rights Reserved.',
u'response': {u'docs': [{u'_id': u'542d8c8638f0d87d7534ce9e',
u'abstract': u'Facebook pledges that future research on its 1.3 billion users will be subjected to greater internal scrutiny from top managers, especially if it is focused on personal topics; pledge follows public backlash after company undertook study that used its newsfeed to manipulate the emotions of some users without telling them; declines to disclose particulars of new research guidelines.',
u'blog': [],
u'byline': {u'contributor': u'',
u'original': u'By VINDU GOEL',
u'person': [{u'firstname': u'Vindu',
u'lastname': u'GOEL',
u'organization': u'',
u'rank': 1,
u'role': u'reported'}]},
u'document_type': u'article',
u'headline': {u'main': u'Facebook Promises Deeper Review of User Research, but Is Short on the Particulars',
u'print_headline': u'Facebook Vow on Research Is Short on the Particulars'},
u'keywords': [{u'is_major': u'Y',
u'name': u'organizations',
u'rank': u'1',
u'value': u'Facebook Inc'}
def parse(articles):
news = []
for i in articles['response']['docs']:
dic = {}
dic['headline'] = i['headline']
dic['date'] = i['pub_date'][0:10]
news.append(dic)
return news
data=parse(articles)
data[0]
{'date': u'2014-10-03',
'headline': {u'main': u'Facebook Promises Deeper Review of User Research, but Is Short on the Particulars',
u'print_headline': u'Facebook Vow on Research Is Short on the Particulars'}}
Now we will use the New York Times Community API to get comments from specific articles. Here we will get comments for David Brook's Sunday Op-Ed column, "The Moral Bucket List." The API retrieves only 25 comments per call; therefore a loop may be needed to retrieve all comments. I just happened to pick this article because it was one of the most e-mailed articles this past week. However you should definitely check out the documentation for a better understanding. To retrieve comments associated with a specific NYTimes.com URL, use the following URI structure:
bucket='http://api.nytimes.com/svc/community/v3/user-content/recent.json?api-key=you-API-number&url=http://www.nytimes.com/2015/04/12/opinion/sunday/david-brooks-the-moral-bucket-list.html'
response = urllib.urlopen(bucket).read()
response_dictB = json.loads(response)
print response_dictionary.keys()
We write a function to extract only information we are interested in and call the function and view the first entry:
def parse(mail):
brooks = []
for b in mail:
dic = {}
dic['comment'] = b['commentBody']
dic['date'] = b['createDate']
dic['comment_type'] = b['commentType']
dic['editorsSelection'] =b['editorsSelection']
dic['email'] = b['email']
dic['recommend'] = b['recommendationCount']
dic['replies'] = b['replies']
dic['name'] = b['userDisplayName']
dic['location'] = b['userLocation']
dic['login'] = b['login']
brooks.append(dic)
return brooks
comments=response_dictionary['results']['comments']
comments=parse(comments)
comments[0]
{'comment': u'The lament of the hollow man who sees but does not understand.',
'comment_type': u'comment',
'date': u'1428853503',
'editorsSelection': 0,
'email': u'wilkinson.eileen@gmail.com',
'location': u'Maine',
'login': None,
'name': u'Eileen Wilkinson',
'recommend': 4,
'replies': []}