Parsing files

What are your methods of parsing files? I need to parse a HTML file
that has data contained inside each table row. I need to determine how many rows the table has so I can get every value.

Comments

  • : What are your methods of parsing files? I need to parse a HTML file
    : that has data contained inside each table row. I need to determine how many rows the table has so I can get every value.
    :
    HTML Files can be very hard to parse. But I would start by stripping everything away, except for the table you want to parse (including the , tags). The number of rows is then equal to the number of tags.
    Hint: Try to count for , since a tag might have tag-attributes. Same goes for other tags.
  • :: I need to parse a HTML file that has data contained
    :: inside each table row. I need to determine how many
    :: rows the table has so I can get every value.

    : HTML Files can be very hard to parse.

    Indeed, tags inside of comments, nested tables,
    or even ( heaven forbid ) badly written HTML
    can all make this a tricky process...

    http://www.houston.quik.com/~jkp/tidypas/

    [code]

    program readtables;

    {$APPTYPE CONSOLE}{$H+}

    uses tidy;

    var
    doc:tTidy;

    procedure WalkNode(aNode:pTidyNode; level:string);
    var
    tr, td, data:pTidyNode;
    buf:TidyBuffer;
    txt:string;
    begin
    if ( aNode <> nil ) then with aNode^ do begin
    if TidyNodeIsTable(aNode) then begin // Found a table
    WriteLn(level, 'Table:');
    tr:=aNode^.content; // First child node of the table
    repeat
    if TidyNodeIsTr(tr) then begin // Found a row
    WriteLn(level, ' Row:');
    td:=tr^.Content;
    repeat
    if TidyNodeIsTD(td) and (td^.content <> nil )
    then begin // Found a cell...
    WriteLn(level, ' Cell:');
    data:=td^.content; // Content of the cell
    repeat // dig in to the cell data
    if TidyNodeIsTable(data) then begin
    WalkNode(data, level + ' ') // Nested table, recurse!
    end else begin
    TidyBufInit(@buf);
    TidyNodeGetText(doc.Handle, td^.content, @buf);
    txt:=buf.bp;
    Write(level, ' ', txt);
    if not (txt[length(txt)] in [#13, #10]) then WriteLn();
    TidyBufFree(@buf);
    end;
    data:=data^.next;
    until ( data = nil )
    end;
    td:=td^.next;
    until ( td = nil )
    end;
    tr:=tr^.next; // Examine next node
    until (tr = nil);
    end;
    WalkNode(Content, level+' '); // Examine child nodes (recursive)
    WalkNode(next, level); // Examine next node (recursive)
    end;
    end;


    begin
    doc:=tTidy.Create(nil);
    doc.ParseFile('your.html');
    doc.ForceOutput:=True; // Force tidy to produce output, even for bad HTML
    doc.ErrorFile:=TIDY_NULL_FILE; // Ignore error messages
    doc.html; // Tell tidy we want HTML format
    WalkNode(doc.RootNode, ''); // Walk the document tree
    doc.Free;
    end.
    [/code]
  • This is the file that I have to parse:

    [code]






    Enterprise Stocks


    06.02.2004 0.87 0.88 0.86
    image 2.35%
    05.02.2004 0.85 0.88 0.85
    image -2.30%
    04.02.2004 0.87 0.88 0.86
    image 0%
    03.02.2004 0.88 0.88 0.88
    image 0%
    02.02.2004 0.88 0.90 0.86
    image 3.53%
    01.02.2004 0.85 0.85 0.85
    image 0%
    31.01.2004 0.85 0.85 0.85
    image 0%
    30.01.2004 0.85 0.85 0.85
    image 0%
    29.01.2004 0.85 0.87 0.85
    image -2.30%
    28.01.2004 0.87 0.88 0.85
    image 2.35%
    27.01.2004 0.86 0.86 0.85
    image 1.18%
    26.01.2004 0.83 0.83 0.81
    image 2.47%


    [/code]

    Each that has text represents a value that my application has to understand. The each of the first value in represents for example the last value of a stock, and the others are the maximum and minimum, also the variation. Now I need to draw a graphic directly into a TImage canvas. Is there any easy way to do it? I'm thinking of putting all the values into an invisible StringGrid, but it seems to be "less professional". How would you do it?
  • : This is the file that I have to parse:
    :
    : [code]
    :
    :
    :
    :
    :
    :
    : Enterprise Stocks
    :
    :
    :
    :
    :
    :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    : : : : : : : : : : :
    06.02.20040.870.880.86
    : image 2.35%
    05.02.20040.850.880.85
    : image -2.30%
    04.02.20040.870.880.86
    : image 0%
    03.02.20040.880.880.88
    : image 0%
    02.02.20040.880.900.86
    : image 3.53%
    01.02.20040.850.850.85
    : image 0%
    31.01.20040.850.850.85
    : image 0%
    30.01.20040.850.850.85
    : image 0%
    29.01.20040.850.870.85
    : image -2.30%
    28.01.20040.870.880.85
    : image 2.35%
    27.01.20040.860.860.85
    : image 1.18%
    26.01.20040.830.830.81
    : image 2.47%
    :
    :
    :
    : [/code]
    :
    : Each that has text represents a value that my application has to understand. The each of the first value in represents for example the last value of a stock, and the others are the maximum and minimum, also the variation. Now I need to draw a graphic directly into a TImage canvas. Is there any easy way to do it? I'm thinking of putting all the values into an invisible StringGrid, but it seems to be "less professional". How would you do it?
    :
    This file is quite good to parse, because the table is very repeatitive. As for the data storage, I would use a TList and a record representing a single item (row) in the table. I would not use the TImage for the output but the TPaintBox, which is designed for it. You can find it on the System page of the component palette.
  • [b][red]This message was edited by OxyzN at 2004-2-10 6:36:8[/red][/b][hr]
    : : This is the file that I have to parse:
    : :
    : : [code]
    : :
    : :
    : :
    : :
    : :
    : :
    : : Enterprise Stocks
    : :
    : :
    : :
    : :
    : :
    : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    : : : : : : : : : : : : : : : : : : : : : :
    06.02.20040.870.880.86
    : : image 2.35%
    05.02.20040.850.880.85
    : : image -2.30%
    04.02.20040.870.880.86
    : : image 0%
    03.02.20040.880.880.88
    : : image 0%
    02.02.20040.880.900.86
    : : image 3.53%
    01.02.20040.850.850.85
    : : image 0%
    31.01.20040.850.850.85
    : : image 0%
    30.01.20040.850.850.85
    : : image 0%
    29.01.20040.850.870.85
    : : image -2.30%
    28.01.20040.870.880.85
    : : image 2.35%
    27.01.20040.860.860.85
    : : image 1.18%
    26.01.20040.830.830.81
    : : image 2.47%
    : :
    : :
    : :
    : : [/code]
    : :
    : : Each that has text represents a value that my application has to understand. The each of the first value in represents for example the last value of a stock, and the others are the maximum and minimum, also the variation. Now I need to draw a graphic directly into a TImage canvas. Is there any easy way to do it? I'm thinking of putting all the values into an invisible StringGrid, but it seems to be "less professional". How would you do it?
    : :
    : This file is quite good to parse, because the table is very repeatitive. As for the data storage, I would use a TList and a record representing a single item (row) in the table. I would not use the TImage for the output but the TPaintBox, which is designed for it. You can find it on the System page of the component palette.
    :
    When I try to get the numeric values, like "1.35", Delphi recognizes it as "0.35", how can I handle this? Also, when I use the negative numbers it recognizes them as positive. I use the StrToFloat and FloatToStr functions.

  • : When I try to get the numeric values, like "1.35", Delphi recognizes
    : it as "0.35", how can I handle this? Also, when I use the negative
    : numbers it recognizes them as positive. I use the StrToFloat and
    : FloatToStr functions.


    Just a wild guess, but it sounds like you might have
    an off-by-one error in your parsing routines.

    For instance:
    1.35
    becomes:
    [ 1 ] [ .35 ] [ ]
    instead of:
    [ ] [ 1.35 ] [ ]

    likewise:
    -1.0
    becomes:
    [ - ] [ 1.0 ] [ ]
    instead of:
    [ ] [ -1.0 ] [ ]
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Categories