Html Agility Pack Get All Elements by Class

Html Agility Pack is a power tool for parsing through document source. I had a need where I needed to parse a document using html agility pack to get all elements by class name. It really is a simple function with Html Agility Pack but getting the syntax right was the difficult part for me.

Here is my use case:

I need to select all elements that have the class float on them. I started with this query which was working for just div tags.

var findclasses = _doc.DocumentNode.Descendants("div").Where(d => 
    d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);

What this does is it takes your _doc and finds all divs within where they have an attribute names class and then goes one step farther to ensure that class contains float

Whew what a mouth full. Lets look at what an example node it would select.

<div class="className float anotherclassName">
</div>

So now, how do we get ALL ELEMENTS in the doc that contain the same class of float. If we take a look back at our HTML Agility Pack query there is one small change we can make to the .Descendants portion that will return all elements by class. This may seem simple, but took quite awhile to come to, if you simply leave .Descendants empty, it will return all elements. Look below:

var findclasses = _doc.DocumentNode.Descendants().Where(d => 
    d.Attributes.Contains("class") && d.Attributes["class"].Contains("float")
);

The query above will return ALL ELEMENTS that include a class with the name of float.

Documented based off my question on stackoverflow here: Html Agility Pack get all elements by class

4 thoughts on “Html Agility Pack Get All Elements by Class

  1. Rahul Muley

    Hi Adam,

    Thanks for the helpful article.

    Wanted to suggest one edit
    it should be
    && d.Attributes[“class”].Value.Contains(“float”)
    instead of
    && d.Attributes[“class”].Contains(“float”)

    Please correct me if I misunderstood.

    Thanks,
    Rahul Muley

    Reply
  2. mala schapira

    Hi,
    and than’k for your example-it is helpful for understanding…
    just-I am using c# and this syntax is not allowed.
    I have a html file and am looking to find all tags(links)which have the name “lifestyle/”…in them.
    IEnumerable linkAtributes =doc.DocumentNode.Descendants(“a”).Where(d =>
    d.Attributes .Contains(“href”) && d.Attributes[“href”].Contains(“lifestyle/”)
    -this kode does not work for me-it’s Causing an error.
    I’ll be happy for some help,
    thank you!!.

    Reply
  3. Pingback: Html Agility Pack get all elements by class - PhotoLens

Leave a Reply

Your email address will not be published. Required fields are marked *

StackOverflow Profile