User Tools

Site Tools


doc:appunti:linux:sa:sanitizer

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
doc:appunti:linux:sa:sanitizer [2023/01/19 11:03] – [The HTML MIME multipart problem] niccolodoc:appunti:linux:sa:sanitizer [2023/01/19 12:09] – [Perl Unescaped left brace warning] niccolo
Line 7: Line 7:
 I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the **Local Delivery Agent** called by **Postfix**. The LDA can be as simple as **procmail** or the more complex **Dovecot LDA with Pigeonhole Sieve Interpreter**. I use it as a personal mail filter in GNU/Linux mail servers, because it can be activated on a per-user basis, by the **Local Delivery Agent** called by **Postfix**. The LDA can be as simple as **procmail** or the more complex **Dovecot LDA with Pigeonhole Sieve Interpreter**.
  
-===== Perl Syntax Warning =====+===== Perl Unescaped left brace warning =====
  
-The version included in Debian Bullseye contains a bug into the Perl code, which triggers the warning message:+The version included in Debian Bullseye contains a deprecated syntax into the Perl code, which triggers the warning message:
  
 <code> <code>
 +Unescaped left brace in regex is passed through in regex;
 +</code>
  
 +It turned out to be into the file **/usr/share/perl5/Anomy/Sanitizer/MacroScanner.pm**, at lines 120 and 127. Here the fix:
 +
 +<code perl>
 +$score +=  4 while ($buff =~ s/\000(ID="\{[-0-9A-F]+)$/x$1/i);
 </code> </code>
 +
 +<code perl>
 +$score +=  1 while ($buff =~ s/\000(ID="\{[-0-9A-F]+\}"|ThisWorkbook\000|PrivateProfileString)/x$1/i);
 +</code>
 +
  
 ===== The HTML MIME multipart problem ===== ===== The HTML MIME multipart problem =====
Line 19: Line 30:
 Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart [[wp>MIME]] message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system. Several mail user agents nowaday compose email messages in HTML format, sometimes without including a text-only copy of the same message. Some agents include the HTML as a part of multipart [[wp>MIME]] message, correctly marked as text/html. Other agents compose the message body directly in HTML, without using the MIME multipart system.
  
-The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the **Content-Type: text/html** or the **filename** of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a **regular expression** to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from **text/html** to something like **application/ANTIVIRUS- 14789** (the type name is composed using the **msg_defanged** configuration option).+In some circumstances Sanitizer defang the HTML message or the HTML part (changing its content type); thus a modern email reader does not display it correctly. In the best case an **anonymous attachment** is shown, in the worst case **an empty message** is shown. 
 + 
 +The Anomy Sanitizer uses several methods to detect the HTML parts into a message, relaying on the **Content-Type: text/html** or the **filename** of the MIME part (if specified). Once it detects an HTML part, it performs some operations on it, one of them is the match with a **regular expression** to confirm that it is actually an HTML text. If that regex test fails, the Sanitizer neutralizes (defang) such part changing its content type from **text/html** to something like **application/DEFANGED-14789** (the type name is composed using the **msg_defanged** configuration option).
  
 That behaviour is triggered by the **feat_files = 1** configuration option (enable filename-based policy decisions). That behaviour is triggered by the **feat_files = 1** configuration option (enable filename-based policy decisions).
Line 45: Line 58:
 </code> </code>
  
 +It is also possibile to remove the ''regexp'' element of the dictionary, in this case Sanitizer will recognize an HTML part only by the content type or the filename.
 +
 +The customized perl module can be installed into **/etc/perl/Anomy/Sanitizer/FileTypes.pm**, without changing the file installed by the Debian package.
  
doc/appunti/linux/sa/sanitizer.txt · Last modified: 2023/01/19 12:11 by niccolo