Python

Moderators: None (Apply to moderate this forum)
Number of threads: 474
Number of posts: 1166

This Forum Only
Post New Thread
Single Post View       Linear View       Threaded View      f

Report
I have a trouble with parsing a string for my use Posted by laxori666 on 28 Jun 2003 at 9:11 PM
OK, I have a string that I got from a website (when I'm not making the program it'll read it all the time...)

Here is the program:
--- START ---

import urllib2

# file = urllib2.urlopen("http://www.tremorseven.com/aim/deepaim.php?job=view")

# print "URL Opened: " + file.geturl()
# URLInfo = file.read()
URLInfo = """
<b>Deep Thoughts by Jack Handey</b><br><br>#235: When this girl at the museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise." She just stared at me, so I said it again, louder. Then she left. I guess she went to try to find some mayonnaise for me. <br><br><a href="http://www.tremorseven.com/aim/deepaim.php?job=view" target="_self">Refresh</a> | <a href="http://www.tremorseven.com/aim/deepaim.php?job=adding">add Deep Thoughts to your aim</a><br><br><a href="https://www.paypal.com/cgi-bin/webscr?cmd=_xclick&business=tinglea@chilitech.net&item_name=the%20deep%20thoughts%20stay%20online%20fund&no_note=1&currency_code=USD&tax=0">please support this service.</a><br><br><font size=1>a service of <a href="http://www.tremorseven.com/">tremorseven.com</a></font>
"""
print "Contents of URL: "
print URLInfo

for x in range(0, len(URLInfo)):
if (URLInfo[x] == '#'):
for y in range(x, x + 5):
if (URLInfo[y] == ':'):
NumberStr = URLInfo[x+1:y]
print "Number of Deep Thought: " + NumberStr
StartOfThought = y+2
break

for z in range(StartOfThoughts, len(URLInfo)):
if (URLInfo[z] == '<':
EndOfThought = z

print "Contents of Deep Thought:"
print URLInfo[StartOfThought: EndOfThought]

---- END ----

I search for the # (this works), then i search for the ":" (this works) and i retrieve the number of this deep thought. Then i search for the position after the ":" for a "<" (this doesn't work).

And it does not work (dum dum dum). Any help would be appreciated.
Report
I have a trouble with parsing a string for my use Posted by laxori666 on 28 Jun 2003 at 9:13 PM
Sorry, apparently this messageboard doesn't like tabs... i guess you'll have to add them if you decide to run it yourself.
Report
Re: I have a trouble with parsing a string for my use Posted by laxori666 on 28 Jun 2003 at 9:16 PM
This isn't the problem, but replace "StartOfThoughts" with "StartOfThought", and add in the ")" after "'<'", those didn't show up.
Report
Re: I have a trouble with parsing a string for my use Posted by infidel on 30 Jun 2003 at 9:00 AM
First off, HTML doesn't like tabs, so you need to use special tags to keep preformatted text properly formatted. On PH, those tags are [code], and [/code]

: OK, I have a string that I got from a website (when I'm not making the program it'll read it all the time...)
:
: Here is the program:
: --- START ---
:
: import urllib2
:
: # file = urllib2.urlopen("http://www.tremorseven.com/aim/deepaim.php?job=view")
:
: # print "URL Opened: " + file.geturl()
: # URLInfo = file.read()
: URLInfo = """
: <b>Deep Thoughts by Jack Handey</b>#235: When this girl at the museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise." She just stared at me, so I said it again, louder. Then she left. I guess she went to try to find some mayonnaise for me. <a href="http://www.tremorseven.com/aim/deepaim.php?job=view" target="_self">Refresh</a> |
<a href="http://www.tremorseven.com/aim/deepaim.php?job=adding">add Deep Thoughts to your aim</a>
<a href="https://www.paypal.com/cgi-bin/webscr?cmd=_xclick&business=tinglea@chilitech.net&item_name=the%20deep%20thoughts%20stay%20online%
20fund&no_note=1&currency_code=USD&tax=0">please support this service.</a><font size=1>a service of <a href="http://www.tremorseven.com/">tremorseven.com</a></font>
: """
: print "Contents of URL: "
: print URLInfo
:
: for x in range(0, len(URLInfo)):
: if (URLInfo[x] == '#'):
: for y in range(x, x + 5):
: if (URLInfo[y] == ':'):
: NumberStr = URLInfo[x+1:y]
: print "Number of Deep Thought: " + NumberStr
: StartOfThought = y+2
: break
:
: for z in range(StartOfThoughts, len(URLInfo)):
: if (URLInfo[z] == '<':
: EndOfThought = z
:
: print "Contents of Deep Thought:"
: print URLInfo[StartOfThought: EndOfThought]
:
: ---- END ----
:
: I search for the # (this works), then i search for the ":" (this works) and i retrieve the number of this deep thought. Then i search for the position after the ":" for a "<" (this doesn't work).
:
: And it does not work (dum dum dum). Any help would be appreciated.

Here's my first stab (note that I broke the text arbitrarily because preformatted text does not wrap and makes this page scroll far to the right if not forced to break):

import urllib2
import re

text = """<b>Deep Thoughts by Jack Handey</b><br><br>#235: When this girl at the
museum asked me whom I liked better, Monet or Manet, I said, "I like mayonnaise."
She just stared at me, so I said it again, louder. Then she left. I guess she went
to try to find some mayonnaise for me. <br><br>
<a href="http://www.tremorseven.com/aim/deepaim.php?job=view"
target="_self">Refresh</a> |
<a href="http://www.tremorseven.com/aim/deepaim.php?job=adding">add Deep Thoughts
to your aim</a><br><br><a href="https://www.paypal.com/cgi-bin/webscr?
cmd=_xclick&business=tinglea@chilitech.net&item_name=the%20deep%20thoughts%20stay%
20online%20fund&no_note=1&currency_code=USD&tax=0">please support this service.</a>
<br><br><font size=1>a service of <a href="http://www.tremorseven.com/">
tremorseven.com</a></font>""" 

for match in re.finditer("#[0-9]+:", text):
    thought_number = text[match.start()+1 : match.end()-1]
    thought = ""
    try:
        thought = text[match.end()+1 : text.index("<", match.end() + 1)].strip()
    except ValueError: # '<' character not found in text
        thought = text[match.end()+1 : ].strip()
    print thought_number
    print thought


You may not be familiar with regular expressions (regex). Python has an "re" module that lets you use them. They are perfect for searching text for patterns. The re.finditer() method takes a pattern and a string and returns an iterable objects so you can step through it with a for loop. The pattern, "#[0-9]+:", is quite simple as far as regexen go. They can be quite complex. This one says "find a substring that starts with a hash (#), is followed by one or more (+) digits ([0-9]) and ends with a colon (:). There are entire books written about regular expressions and I highly recommend you at least learn the basics. I tried to come up with a regular expression that would also pick out the "deep thought" as well, but that was beyond my ability, so I just opted for using the string method "index" which returns the position of a substring you specify (you can optionally specify the start and end points for the search as well).

Try this out and let me know if you have any other questions.


infidel




 

Recent Jobs

Official Programmer's Heaven Blogs
Web Hosting | Browser and Social Games | Gadgets

Popular resources on Programmersheaven.com
Assembly | Basic | C | C# | C++ | Delphi | Flash | Java | JavaScript | Pascal | Perl | PHP | Python | Ruby | Visual Basic
© Copyright 2011 Programmersheaven.com - All rights reserved.
Reproduction in whole or in part, in any form or medium without express written permission is prohibited.
Violators of this policy may be subject to legal action. Please read our Terms Of Use and Privacy Statement for more information.
Operated by CommunityHeaven, a BootstrapLabs company.