Html Agility Pack is a power tool for parsing through document source. I had a need where I needed to parse a document using html agility pack to get all elements by class name. It really is a simple function with Html Agility Pack but getting the syntax right was the difficult part for me.
Here is my use case:
I need to select all elements that have the class
float on them. I started with this query which was working for just
var findclasses = _doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Contains("float") );
What this does is it takes your
_doc and finds all divs within where they have an attribute names
class and then goes one step farther to ensure that
Whew what a mouth full. Lets look at what an example node it would select.
<div class="className float anotherclassName"> </div>
So now, how do we get
ALL ELEMENTS in the doc that contain the same class of float. If we take a look back at our HTML Agility Pack query there is one small change we can make to the
.Descendants portion that will return all elements by class. This may seem simple, but took quite awhile to come to, if you simply leave
.Descendants empty, it will return all elements. Look below:
var findclasses = _doc.DocumentNode.Descendants().Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Contains("float") );
The query above will return ALL ELEMENTS that include a class with the name of float.
Documented based off my question on stackoverflow here: Html Agility Pack get all elements by class