Sonntag, 23. März 2025

openpyxl - data_only beware

openpyxl is a popular library to access Excel data via python.

However, there is an unexpected side effect if you have formulas in your Excel workbook.
I'm not the first to find that out the hard way - so here is another warning.

If you have a formula in your xlsx-spreadsheet like "=B1-A1" the spreadsheet app (Excel, LibreOffice etc.) stores the formula as well as the calculated result which you can see in the cell.

If the formula is stored e.g. in "C1", and you open the file in your python script like this:

    import openpyxl

    wb = openpyxl.load_workbook(filename="testwb.xlsx")
    print(sheet["C1"].value)
    # prints the formula

The result will be the formula, not the calculated value.

In order to access the calculated value, you have to open it with data_only set to True.

    import openpyxl

    wb = openpyxl.load_workbook(filename="testwb.xlsx", data_only=True)
    print(sheet["C1"].value)
    # prints the calculated value


The catch is that if you try to save the workbook later with for example

    wb.save("testwb2.xlsx")

all formulas in the entire workbook are gone (if the workbook was loaded with data_only=True).

If you must have access to the results of the formulas that Excel has calculated, the work-around is to open two instances of the workbook: one with and one without data_only.
Make the one using data only also read_only, just in case

    wb = openpyxl.load_workbook(filename="testwb.xlsx", data_only=False)
    wb_dataonly = openpyxl.load_workbook(filename="testwb.xlsx", data_only=True, read_only=True)

This way you have access to the calculated values via wb_dataonly and you can add data and save the result using wb... and yes, you have to keep in mind that the two instances go out of sync as soon as you modify wb.


Keine Kommentare: