CSci 150: Foundations of computer science I
Home Syllabus Assignments Tests

Python: Retrieving over network

traceroute to 209.65.57.4 (209.65.57.4), 30 hops max, 40 byte packets
 1  daisy.getnet.net (216.19.223.119)  0.068 ms  0.015 ms  0.012 ms
 2  phnx-gsr0.getnet.net (216.19.201.241)  0.435 ms  0.422 ms  0.548 ms
 3  209.145.192.233 (209.145.192.233)  1.007 ms  1.122 ms  1.143 ms
 4  gi1-4.ccr01.phx02.atlas.cogentco.com (38.104.116.81)  0.942 ms  0.935 ms  0.921 ms
 5  te4-1.ccr01.san01.atlas.cogentco.com (154.54.7.86)  9.873 ms te3-1.ccr01.san01.atlas.cogentco.com (154.54.27.109)  10.023 ms te4-1.ccr01.san01.atlas.cogentco.com (154.54.7.86)  10.039 ms
 6  te9-1.ccr02.lax01.atlas.cogentco.com (154.54.27.121)  12.762 ms  13.059 ms  13.044 ms
 7  te8-3.ccr02.lax05.atlas.cogentco.com (154.54.29.202)  13.173 ms  13.574 ms  13.839 ms
 8  192.205.35.5 (192.205.35.5)  15.961 ms  15.952 ms 192.205.37.157 (192.205.37.157)  15.098 ms
 9  cr2.la2ca.ip.att.net (12.122.84.218)  82.167 ms  83.625 ms  82.419 ms
10  cr1.slkut.ip.att.net (12.122.30.29)  81.419 ms  80.472 ms  80.478 ms
11  cr2.dvmco.ip.att.net (12.122.30.26)  82.137 ms  80.227 ms  82.438 ms
12  cr1.dvmco.ip.att.net (12.122.31.21)  81.124 ms  81.116 ms  80.440 ms
13  cr1.kc9mo.ip.att.net (12.122.3.46)  82.407 ms  80.479 ms  81.897 ms
14  cr2.kc9mo.ip.att.net (12.122.28.82)  80.813 ms  81.287 ms  82.564 ms
15  cr2.sl9mo.ip.att.net (12.122.28.90)  82.383 ms  81.122 ms  81.093 ms
16  cr1.sl9mo.ip.att.net (12.122.2.217)  83.331 ms  81.982 ms  83.288 ms
17  gar1.ltrar.ip.att.net (12.122.112.5)  81.631 ms  81.621 ms *
18  12.94.230.206 (12.94.230.206)  81.522 ms  81.670 ms  81.550 ms
19  209.65.57.62 (209.65.57.62)  82.302 ms  82.485 ms  82.098 ms

A primitive way of retrieving Web pages:

import socket

url = raw_input('URL? ')
if url.startswith('http://'):
    url = url[7:]
slash = url.find('/')
if slash < 0:
    hostname = url
    pagename = '/'
else:
    hostname = url[:slash]
    pagename = url[slash:]

conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
conn.connect((hostname, 80))
to_send = [
    'GET ' + pagename + ' HTTP/1.1',
    'Host: ' + hostname,
    '']
for line in to_send:
    conn.send(line + '\r\n')

next = conn.recv(102400)
while next:
    print next
    next = conn.recv(102400)
conn.close()

Using urllib, which hides all the ugly details:

import urllib

url = raw_input('URL? ')
conn = urllib.urlopen(url)
content = conn.read()
conn.close()

start = content.find('<title>')
end = content.find('</title>')
if start >= 0 and end >= 0:
    print content[start + 7:end].strip()
else:
    print 'title unknown'