Skip to content

Conversation

@krrg
Copy link

@krrg krrg commented Mar 19, 2025

(Feature Proposal)

Currently POI strictly enforces compliance with the OPC specs and provides no mechanism by which these checks can be disabled by the library consumer. Unfortunately, there do exist documents in the wild that do not strictly adhere to one or more of these specifications but which still need to be parsed. I have observed a number of documents "in the wild" over the years which are not strictly compliant, but otherwise can be parsed successfully were it not for the strict compliance checking.

I would like to introduce a new OPCComplianceFlags class which consumers may optionally include when opening an OPCPackage. This will allow users to optionally disable the current strict checking behavior at a granular level for M4.2 through M4.5, allowing them to parse documents that are technically non-compliant, but which are otherwise valid.

This change should be completely backwards compatible. All existing public APIs are retained and will default to the existing behavior of strict OPC enforcement if no compliance flag parameter is passed. I have added a few additional tests to ensure these flags work if specified.

Thanks in advance for any feedback you might have!

* This is useful to allow parsing of certain non-compliant documents.
* @throws OpenXML4JRuntimeException if there are issues creating properties part
*/
OPCPackage(PackageAccess access, OPCComplianceFlags opcComplianceFlags) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add @since POI 5.4.1 to new non-private methods and constructors?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok with new classes just to put this @since on the class level content and not on methods/constructors.

@Test
void testDisableEnforceCompatibilityMarkup() throws IOException {
testComplianceFlagOverridePreventsException(
"DoNotUseCompatibilityMarkupFAIL.docx",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files - should they be in the PR or are they already in the repo? It doesn't look right that they have no relative path info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are new files that need to be added to the repo (test-data dir and it has subdirs based on file type), it will probably break the poi-integration tests which run automatic tests on all files in test-data dir and invalid files will need to be explicitly skipped

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files already exist and are already being used in the neighboring tests. The existing tests in this file check that we do enforce compliance. My additional tests confirm that when a specific compliance setting (or settings) are turned off, the file then parses without error.

I'm using the same OpenXML4JTestDataSamples.openSampleStream method the other tests use, which I believe handles the actual prefixing, etc.

I ran all of the tests with gradle test, and everything passed on my machine including all the poi-integration tests, so I think we're okay? Happy to adjust anything if I missed something.

@asfgit asfgit closed this in aa964c3 Mar 19, 2025
@pjfanning
Copy link
Member

Thanks - merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants