Clearing HTML from unnecessary characters

Once you have viewed an HTML document generated by ASP.NET and containing a GridView, I noticed that a large number of characters contained on the page are spaces and tabs. Obviously, ASP.NET lavishly arranged them wherever possible.

Of course, you can get rid of extra kilobytes of an HTML document by using libraries that compress HTML documents using various algorithms. However, this method will force the server to compress the generated HTML document first, and the user’s browser will have to decompress the document that came to it.

In my article, I want to suggest a way to get rid of various unnecessary characters that are contained on pages. This method is based on using the HttpModule assembly.

So what is an HttpModule? Here’s what the MSDN library says about it::

HttpModule is an assembly that implements the IHttpModule interface and handles. ASP events.NET includes a set of HttpModule assemblies that can be used in user applications. For example, SessionStateModule is provided by ASP.NET to deliver session state services to the application. Custom HttpModule handlers can be created as a response to an event ASP.NET or a user event.

The general procedure for writing an HttpModule handler is as follows:

  • Implementation of the IHttpModule interface.
  • Processing the Init method and registering the necessary events.
  • Event handling.
  • Implement (if necessary) the Dispose method if cleaning is required.
  • Registration of the module in the Web.config file.

The HTTP Module is included in the user request processing process after creating the HTTP Application object and before creating the HTTP Handle, so the HTTP Module allows you to process the following events of the HTTP Application object:

  • BeginRequest
  • AuthenticateRequest
  • PostAuthenticateRequest
  • AuthorizeRequest
  • PostAuthorizeRequest
  • ResolveRequestCache
  • PostResolveRequestCache
  • PostMapRequestHandler
  • AcquireRequestState
  • PostAcquireRequestState
  • PreRequestHandlerExecute
  • PostRequestHandlerExecute
  • ReleaseRequestState
  • PostReleaseRequestState
  • UpdateRequestCache
  • PostUpdateRequestCache
  • EndRequest

Event handlers are enabled in the Init method of the HttpModule class.

In my example, I need to connect a handler for 2 ReleaseRequestState events, just after the HTML version of the page has been generated and the Response object is ready to be sent to the user.

/// <summary> 
        /// Enabling event handlers 
        /// </summary> 
        public void Init(HttpApplication context) 
        {
             //Enabling the handler for the ReleaseRequestState event 
context. ReleaseRequestState += new EventHandler(this.context_Clear);
             //Enabling the handler for 
the PreSendRequestHeaders context event.PreSendRequestHeaders += new EventHandler(this.context_Clear); 
             //Two handlers are required for compatibility with HTML document compression libraries 
}

The handler itself will look like this:

/// <summary> 
        /// Обработчик события PostRequestHandlerExecute 
        /// </summary> 
        void context_Clear(object sender, EventArgs e) 
        { 
 HttpApplication app = (HttpApplication)sender; //Getting an HTTP Application
             //Getting the name of the file that is being processed 
             string realPath = app.Request. Path.Remove(0, app.Request.ApplicationPath.Length + 1); 
             //We check whether it is a reference to the build resource 
             if (realPath = = "WebResource. axd") 
                  return; 
             //Checking the content type 
             if (app. Response.ContentType == "text/html" || app.Response.ContentType == "text/javascript") 
                 // Setting 
the app.Context filter handler.Response.Filter = new HTMLClearer(app.Context.Response.Filter); 
 }

The filter handler is the most important thing. It allows you to change the contents of the Response object. And additional checks are necessary to exclude processing of assembly resources and documents whose type differs from text / html and text/javascript (there is no need to remove extra characters in documents of a different type).

Now let’s pay attention to the Response content handler.

This is a class that inherits System. IO. Stream. In its implementation, we are only interested in one method – the Write method:

/// <summary> 
        /// Processing data received in Response 
        /// </summary></summary> 
        public override void Write(byte[] buffer, int offset, int count) 
            {
                //Converting an array of bytes to a string 
                string s = System. Text.Encoding.UTF8.GetString(buffer); 
                //Using regular expressions, we remove all unnecessary 
s = Regex characters.Replace(s, ">(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}(rn){0,10} {0,20}(rn){0,10} {0,20}<", "><", RegexOptions.Compiled); 
 s = Regex.Replace(s, ";(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}", ";", RegexOptions.Compiled); 
 s = Regex.Replace(s, "{(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}", "{", RegexOptions.Compiled); 
 s = Regex.Replace(s, ">(rn){0,10}t{0,10}<", "><", RegexOptions.Compiled); 
 s = Regex.Replace(s, ">r{0,10}t{0,10}<", "><", RegexOptions.Compiled); 
                //The resulting string is converted back to byte 
                byte[] outdata = System. Text.Encoding.UTF8.GetBytes(s); 
                //We write it in Response 
_HTML.Write(outdata, 0, outdata.Length); 
 }

As well as the class constructor:

public HTMLClearer(System.IO.Stream HTML) 
                { _HTML = HTML; }

To demonstrate an example of using the HTTP Module and the content handler for the HTTP Response object, let’s create a Class Library project and call it HTMLClearer. In this project, create an HTMLClearer. cs file containing the following text:

using System; 
        using System.Collections.Generic; 
        using System.Text; 
        using System.Web; 
        using System.Text.RegularExpressions; 
        namespace HTMLClearer 
 { 
            public class HTMLClearer : System.IO.Stream 
 { 
 System.IO.Stream _HTML; 
                public HTMLClearer(System.IO.Stream HTML) 
                { _HTML = HTML; } 
                #region Стандартные методы и свойства 
                public override bool CanRead 
 { get { return false; } } 
                public override bool CanSeek 
 { get { return false; } } 
                public override bool CanWrite 
 { get { return true; } } 
                public override long Length 
 { get { return _HTML.Length; } } 
                public override long Position 
 { 
 get { return _HTML.Position ; } 
                    set { _HTML.Position = value; } 
 } 
                public override long Seek(long offset, System.IO.SeekOrigin origin) 
                { return _HTML.Seek(offset, origin); } 
                public override void SetLength(long value) 
                { _HTML.SetLength(value); } 
                public override void Flush() 
                { _HTML.Flush(); } 
                public override int Read(byte[] buffer, int offset, int count) 
                { return _HTML.Read(buffer, offset, count); } 
                #endregion 
                /// <summary> 
                /// Processing data received in Response 
                /// </summary> 
                public override void Write(byte[] buffer, int offset, int count) 
                { 
                    //Converting an array of bytes to a string 
                    string s = System. Text.Encoding.UTF8.GetString(buffer); 
                    //Using regular expressions, we remove all unnecessary 
s = Regex characters.Replace(s, ">(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}(rn){0,10} {0,20}(rn){0,10} {0,20}<", "><", RegexOptions.Compiled); 
 s = Regex.Replace(s, ";(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}", ";", RegexOptions.Compiled); 
 s = Regex.Replace(s, "{(rn){0,10} {0,20}t{0,10}(rn){0,10}t{0,10}", "{", RegexOptions.Compiled); 
 s = Regex.Replace(s, ">(rn){0,10}t{0,10}<", "><", RegexOptions.Compiled); 
 s = Regex.Replace(s, ">r{0,10}t{0,10}<", "><", RegexOptions.Compiled); 
                    //The resulting string is converted back to byte 
byte[] outdata = System. Text.Encoding.UTF8.GetBytes(s); 
                    //We write it in Response 
_HTML.Write(outdata, 0, outdata.Length); 
 } 
 } 
            public class HTTPModule_Clearer : IHttpModule 
 { 
                #region IHttpModule Members 
                public void Dispose() 
                { 
 } 
                /// <summary> 
                /// Enabling event handlers 
                /// </summary> </summary>
                public void Init(HttpApplication context) 
                { 
                   //Enabling the handler for the ReleaseRequestState event 
context. ReleaseRequestState += new EventHandler(this.context_Clear);
                   //Enabling the handler for 
the PreSendRequestHeaders context event.PreSendRequestHeaders += new EventHandler(this.context_Clear); 
                   //Two handlers are required for compatibility with HTML document compression libraries 
} 
                /// <summary> <summary>
                /// Event handler 
                for PostRequestHandlerExecute /// </summary> < /summary>
                void context_Clear(object sender, EventArgs e) 
                { 
 HttpApplication app = (HttpApplication)sender; //Получение HTTP Application 
                    string realPath = app.Request.Path.Remove(0, app.Request.ApplicationPath.Length + 1); //Getting the name of the file that is processed 
                    by if (realPath = = "WebResource. axd") //We check whether it is a reference to the build resource 
                        return; 
                    if (app. Response.ContentType == "text/html" || app.Response.ContentType == "text/javascript") //Checking 
the app.Context content type.Response.Filter = new HTMLClearer(app.Context.Response.Filter); // Setting the filter handler 
} 
                #endregion 
} 
}

After all these manipulations, we compile the project, and connect the resulting library to the Website via Add Reference.

Now we need to connect the HTTP Module to the general request processing flow. To do this, you need to make some changes in the web.config file, namely, add a link to the module in the system.web section:

<httpModules>
              <add name="HTTPModule_Clearer"  type="HTMLClearer.HTTPModule_Clearer, HTMLClearer"/> 
         </httpModules>

The general view of the <HttpModules> tag taken from MSDN looks like this:

<httpModules> 
       <add type="classname,assemblyname" name="modulename"/> 
       <remove name="modulename"/> 
       <clear/> 
    </httpModules>

<add>
Adds the HttpModule class to the app.

Note that if the command-path combination is identical to the one specified earlier (for example, in the Web. config file of the parent folder), the second call to <add> overrides the previous value.

<remove>
Deletes the HttpModule class from the app.

<clear>
Removes all HttpModule mappings from the app.

When using this module, the size of HTML pages sent to the user is reduced by about 10%, which can not but affect the traffic of both the user and the server.

P.S.

Using the HTTP Module, you can also make changes to the response sent to the user (for example, if you need to add a Header or Footer to the page).

Also, by adding the Write method, you can save certain parts of the page or the entire page to a database or file.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like
Read More

App Model ASP.NET

Difference between applications ASP.NET and feature-rich client applications, which makes a lot of sense when analyzing the execution…