Correctly "Point" to a Data Element on Webpage |
Correctly "Point" to a Data Element on Webpage |
Crusader |
Nov 26 2022, 11:57 AM
Post
#1
|
Group: Members Posts: 3 Joined: 26-November 22 Member No.: 28,654 |
New member, first post.
For quite a while I had been using the formula below to import "Earnings Date" from CNBC to GoogleSheets. For the past few months, the formula has stopped working. My understanding is, data elements have been moved around on the CNBC page (redesigned) and I am no longer correctly "pointing" to the element I wish to import ("Earnings Date"). I don't know HTML to figure out how to correct the formula below so it points to "Earnings Date" (under "Events") on the "redesigned" web page. All my trial and error efforts have been in vain. A ticker symbol has to be entered in the search box at the top right side of the page; this prompts CNBC to pull data for that ticker symbol. One of the data elements is "Earnings Date." It can be found on the lower half of the page, under the heading "Events." A ticker symbol picked at random: AAL. In this example, the data element I am looking for is "01/18/2023(est)." Please note: The formula below is only the HTML portion of the formula I use in GoogleSheets; I have left out the part that pertains to the "spreadsheet" portion. I will be happy to share the full formula if that helps. I am requesting assistance in getting the formula below corrected so it "points" to "Earnings Date" on CNBC's web site. If this is not the correct forum, please guide me to an appropriate forum. CODE "//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]" All help will be greatly appreciated! |
Christian J |
Nov 29 2022, 11:02 AM
Post
#2
|
. Group: WDG Moderators Posts: 9,739 Joined: 10-August 06 Member No.: 7 |
QUOTE CODE "//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]" The above looks like a mix of a URL and a javascript DOM tree, is it a proprietary format used by GoogleSheets? Alas I have no idea how GoogleSheets works. |
Crusader |
Nov 29 2022, 01:53 PM
Post
#3
|
Group: Members Posts: 3 Joined: 26-November 22 Member No.: 28,654 |
The above looks like a mix of a URL and a javascript DOM tree, is it a proprietary format used by GoogleSheets? Alas I have no idea how GoogleSheets works. I don't know if this is GoogleSheets proprietary format. The complete formula is: CODE =IMPORTXML("https://www.cnbc.com/quotes/"&A4,"//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[3]/ul/li[1]/span[2]") Where A4 represents the ticker symbol - in my example, AAL. |
Christian J |
Nov 29 2022, 05:27 PM
Post
#4
|
. Group: WDG Moderators Posts: 9,739 Joined: 10-August 06 Member No.: 7 |
I don't know HTML to figure out how to correct the formula below so it points to "Earnings Date" (under "Events") on the "redesigned" web page. How did you arrive at this formula in the first place (when it still worked)? Can't you redo the process for the redesigned CNBC page? |
jimlongo |
Nov 30 2022, 11:07 PM
Post
#5
|
This is My Life Group: Members Posts: 1,128 Joined: 24-August 06 From: t-dot Member No.: 16 |
why not use some named elements so you don't have to traverse from the top of the dom?
ul.Summary-events-stock li.Summary-stat span.Summary-value In the teach a man to fish department … you should learn to use the Inspector. In your browser right click on the date data you want and choose "Inspect Element". This will tell you a lot about the structure. What I'm suggesting is that the <ul class="Summary-events-stock"> is unique on the page, so you can start there and follow to the next <li> and the correct <span> This post has been edited by jimlongo: Nov 30 2022, 11:22 PM |
Crusader |
Dec 2 2022, 05:39 AM
Post
#6
|
Group: Members Posts: 3 Joined: 26-November 22 Member No.: 28,654 |
Thank you for all the support: it is greatly appreciated! I was able to update second half of the formula using "XPath."
During my research I learnt websites change ("update") pages and use newer (website) technology to prevent "competitors" from importing data from the website; however, that puts individuals like myself in a quandary. Jimlongo, thank you for your suggestion. I will work towards incorporating named elements in my formula: it will make my formula more manageable and easier to update. For the record, the updated second half of the formula is as follows: CODE "//html/body/div[2]/div/div[1]/div[3]/div/div[2]/div[1]/div[5]/div[2]/section/div[4]/ul/li[1]/span[2]" |
Lo-Fi Version | Time is now: 10th November 2024 - 01:00 PM |