Ticket #170 (reopened defect)

Opened 8 years ago

Last modified 8 years ago

problem with SGL_String::summariseHtml

Reported by: demian Assigned to: werner
Priority: normal Milestone:
Component: module - publisher Severity: normal
Keywords: Cc:


It returns HTML summary of the first few lines of text from article body. But in my opinion it should use strip_tags() and return clean text (no HTML) because there are more tags which can make the layout of list of articles if they are not closed (ie. table tag, now only ul/ol tags are closed here - I had such issue).

Maybe you have other ideas... Maybe there is some function to close all opened html tags (I have found nothing)


170_rev1203.diff (3.9 kB) - added by werner on 01/23/06 18:24:08.

Change History

11/18/05 02:57:42 changed by OpenHaus


maybe have a look at http://pear.php.net/package/HTML_Safe and filter out any undefined HTML tags (e.g. table, etc.) and to close open ones. I am using the package and it works quite well despite its alpha status.

Ciao Fabio.

12/18/05 08:35:11 changed by demian

  • milestone changed from 0.5.4 to 0.5.5.

01/23/06 13:51:34 changed by demian

  • milestone changed from 0.5.5 to 0.5.6.

01/23/06 15:47:25 changed by werner

  • owner changed from somebody to werner.

i wonder why summarise() is based on words (which makes sense to me) and summariseHtml is based on lines (seperated by line breaks \n, which isn't good if you use e.g. fck for adding an article, cause it strips \n where possible)

How about making it also on a "50 words" basis?

01/23/06 18:24:08 changed by werner

  • attachment 170_rev1203.diff added.

02/16/06 06:35:28 changed by demian

  • status changed from new to closed.
  • resolution set to invalid.

hi Werner - a few things regarding your patch

1) it's incomplete: //FIXME: how to pass $allowedTags to array_map?

2) lines are used instead of words as a </p> is expected at the end of a paragraph line, simplifies parsing

3) it's not an ideal solution, only a few tags are matched, need improvement

4) any html tag parsing solution uses an array of target tags, not a string as you did

5) let's not alter clean() with "function clean($var, $allowedTags=)", this method has deps

6) for a good eg of this please see HTML_Safe lib, some interesting parsing here

7) the lib seems too big to include (along with XML_HTML_Sax3) would be great if we could take only needed parts

02/21/06 05:09:14 changed by werner

  • status changed from closed to reopened.
  • resolution deleted.
  • milestone deleted.

as the problem still appears why do you close the ticket?

2) as lines are different long this results in ugly ouput if you use this function to summarise stuff for a "preview". We should definitly switch to "summarise by words" approach in both functions

3) needs improvement, sure, but we need it quite flexible to allow only needed tags (in a preview placed in a block i don't need images etc)