Is there any info out there on how the new Excel data source works?
Great question
This merged as part of V10 but isn’t included in the docs yet. I haven’t used it before but I just loaded it up and have a working version. I’ll open a PR to the documentation website soon. I figured this out just by looking at the query runner source code
But for now here you go:
Excel Data Source
The Excel data source introduced in V10 accepts queries in yaml
format. Your yaml
may contain two different tags:
- url - the URL that will be fetched
- ua - the user agent string that will be passed in with the request
An example query that uses a publicly available data set:
url: https://www.dol.gov/sites/dolgov/files/ETA/naws/pdfs/NAWS_A2E191.xlsx
It produces this table:
Not sure if I should create a new topic, or if i am allowed to continue here, but i was wondering if the excel data source allows picking up multiple files from the location ( something like location*.xlsx, or location test%.xlsx)
Not as far as I know. But this could be a good addition. How would you expect this to behave? Since most queries return at most one set of tabular data, would you expect multiple table visualisations to appear? Would the tables be concatenated?
i was thinking of a scenario where the files are similar in structure (eg: different day logs , etc), so the expected behavior in this case would be indeed concatenation.
That makes sense. Can you create a feature request thread for this here on the forum? Just specify the expected behavior. This way the community can get visibility on outstanding pull requests.
Alternatively, if you want to implement this yourself I’m happy to answer questions, review, merge etc.
How does the Excel data source handle spreadsheets that contain multiple sheets / tabs?
If you look at the query runner code it appears to use the default behavior of read_excel
in Pandas. Which is to pull the first sheet. It would be a trivial adjustment to make this configurable in the query, though.
See the query runner code here.
Hi !
I rode the query runner again and documentation too. Actually, it’s already possible to query a sheet number with XLS file
The solution is here (excel.py, line 57) :
workbook = pd.read_excel(response.content, **args)
As you can see, the list called args is depacted as function parameters. So, we just have to use the correct parameter name to use it, according to the read_excel function !
For example, this is my nextcloud spreadsheet. First share the link (yellow rectangle) :
Then, copy the download link (yellow rectangle) :
Create a new query with excel datasource and fill parameters “url” with the link copied and “sheet_name” with the sheet number. First sheet = 0 (by default) / second sheet = 1 / etc.
You can easily change the sheetnumber like this :
And of course others parameters like nrows :
see you
could someone help with authenticating to excel datasource, e.g. if we have an excel hosted on sharepoint, it will require authentication.
What kind of authentication is needed? HTTP Basic auth? Oauth?
yes basic auth should work
Alright the existing excel query runner doesn’t support basic auth. This means there are a couple options:
- Write a custom sharepoint query runner so you enter the username and password in secret fields on the data source configuration screen (I can help you with this).
- Update the excel query runner to accept HTTP basic auth key-value-pairs
- Use the existing Python query runner so you can access
requests
directly and also transform the data however you need.
May I know whether it will work with Microsoft OneDrive hyperlink? It seems it’s not working.
Another question is, I got the following error when trying to load the sample excel file (url: https://www.dol.gov/sites/dolgov/files/ETA/naws/pdfs/NAWS_A2E191.xlsx):
Thanks a million for your comment.
Hey @jesse !
Is it possible to read an excel data source and provide a private token ? I’m working with KoboToolbox and I would like to access to private data. Any idea ?