Printable Version of Topic

Click here to view this topic in its original format

HTMLHelp Forums _ General Web Design _ Correctly "Point" to a Data Element on Webpage

Posted by: Crusader Nov 26 2022, 11:57 AM

New member, first post.

For quite a while I had been using the formula below to import "Earnings Date" from https://www.cnbc.com/quotes/AAL?qsearchterm=aal to GoogleSheets. For the past few months, the formula has stopped working. My understanding is, data elements have been moved around on the CNBC page (redesigned) and I am no longer correctly "pointing" to the element I wish to import ("Earnings Date"). I don't know HTML to figure out how to correct the formula below so it points to "Earnings Date" (under "Events") on the "redesigned" web page. All my trial and error efforts have been in vain.

A ticker symbol has to be entered in the search box at the top right side of the page; this prompts CNBC to pull data for that ticker symbol. One of the data elements is "Earnings Date." It can be found on the lower half of the page, under the heading "Events." A ticker symbol picked at random: AAL. In this example, the data element I am looking for is "01/18/2023(est)."

Please note: The formula below is only the HTML portion of the formula I use in GoogleSheets; I have left out the part that pertains to the "spreadsheet" portion. I will be happy to share the full formula if that helps.

I am requesting assistance in getting the formula below corrected so it "points" to "Earnings Date" on CNBC's web site.

If this is not the correct forum, please guide me to an appropriate forum.

CODE
"//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]"

All help will be greatly appreciated!

Posted by: Christian J Nov 29 2022, 11:02 AM

QUOTE
CODE
"//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]"

The above looks like a mix of a URL and a javascript DOM tree, is it a proprietary format used by GoogleSheets? Alas I have no idea how GoogleSheets works.

Posted by: Crusader Nov 29 2022, 01:53 PM

QUOTE(Christian J @ Nov 29 2022, 12:02 PM) *

The above looks like a mix of a URL and a javascript DOM tree, is it a proprietary format used by GoogleSheets? Alas I have no idea how GoogleSheets works.

I don't know if this is GoogleSheets proprietary format. The complete formula is:
CODE
=IMPORTXML("https://www.cnbc.com/quotes/"&A4,"//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]")

Where A4 represents the ticker symbol - in my example, AAL.

Posted by: Christian J Nov 29 2022, 05:27 PM

QUOTE(Crusader @ Nov 26 2022, 05:57 PM) *

I don't know HTML to figure out how to correct the formula below so it points to "Earnings Date" (under "Events") on the "redesigned" web page.

How did you arrive at this formula in the first place (when it still worked)? Can't you redo the process for the redesigned CNBC page?

Posted by: jimlongo Nov 30 2022, 11:07 PM

why not use some named elements so you don't have to traverse from the top of the dom?

ul.Summary-events-stock li.Summary-stat span.Summary-value


In the teach a man to fish department … you should learn to use the Inspector.
In your browser right click on the date data you want and choose "Inspect Element".
This will tell you a lot about the structure.

What I'm suggesting is that the <ul class="Summary-events-stock"> is unique on the page, so you can start there and follow to the next <li> and the correct <span>

Posted by: Crusader Dec 2 2022, 05:39 AM

Thank you for all the support: it is greatly appreciated! I was able to update second half of the formula using "XPath."

During my research I learnt websites change ("update") pages and use newer (website) technology to prevent "competitors" from importing data from the website; however, that puts individuals like myself in a quandary.

Jimlongo, thank you for your suggestion. I will work towards incorporating named elements in my formula: it will make my formula more manageable and easier to update.

For the record, the updated second half of the formula is as follows:

CODE
"//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[4]/ul/li[1]/span[2]"

Powered by Invision Power Board (http://www.invisionboard.com)
© Invision Power Services (http://www.invisionpower.com)