Skip to end of metadata
Go to start of metadata

Why to include a license notice in each file in a repo (with exceptions)

It is inadequate to provide a single license file at the top of an open source project repo, for the following reasons:

  • Files can be pulled out of the repo and use for arbitrary purposes, and the license will not go with them unless the user makes the effort to copy the license into the files they use. If the files carry a specific license, inadvertent loss of license clarity would be avoided, and abuse of this would be more significantly inhibited.
  • in general, the practice of not having a license in each file will complicate efforts to scan for
    • compatibility of code with other code it tightly integrates with (e.g. through “import” or linking)
    • provenance of the code (the history of contributions, who owns the copyright, etc)
    • Such a process would have to somehow know:
      • where the original source repo of the project is (associate the integrated code with a project)
      • where to find the license file in the repo (there are common practices, but no firm standards for this)

Thus in general and with limited exceptions, any file which is compatible with a license as comments should carry a code or doc license. This applies to all file types: code, data, docs, and readme’s (informal docs, user guides, etc).

When and why (or why not) to add a license

An up-to-date license should be included in each file if it is:

  • code, data, documentation
  • new or substantially changed for this project
  • a substantial part of the work of the project
  • the file type is compatible with comments
  • note that while typically covered by the repo-top license file, media files often *do* support metadata for a license

A license should not be included in each file if:

  • the file does meet the criteria above
  • the file type is (conventionally) compatible but the project software isn't
  • however this really represents a bug in the project software

A LICENSE.txt file should be included at the top folder of the repo:

  • as a catch-all for any files that don't / can't carry a license internally
  • but NOT as a way to avoid having to include licenses within each file

Process for license additions to Acumos code

Document exclusions for files that typically carry no/minimal unique content, e.g.

  • python __init__.py

License cleanup process for seed code (to bring essential files into alignment with this best practice):

  • Clone all repos
  • Run scripts to identify missing licenses, with exclusions
    • starting with the most common / substantial code, data, document files
  • Run scripts to update the files and generate git diff
  • Send report of identified files and git diff to project lead
  • Project lead makes decision to update the files through their existing process or use the updated files for commit
  • Updated files are submitted and tested for no side-effects

Periodically, until "Ongoing Process" is stood up:

  • Clone all repos
  • Run scripts to identify missing licenses, with exclusions
  • Send report of identified files to project lead, who takes further actions

Pre-launch process preparation

  • Create comprehensive list of all file types in Acumos repos, with indication as to why/when they will need to include a license, and specific filename filter regex rules to use in the Gerrit commit gate process
  • Develop/update Gerrit commit gate job
  • Prepare license templates for each file type that can carry a license
  • Prepare how-to's for file types that can include metadata licenses
  • Review current repos for embedded non-Acumos code, and
    • develop practices for when/how this is allowed
    • explain other options the developer should consider to avoid embedding non-Acumos code

Ongoing Process

  • Ensure developers are using license insertion plugins (e.g. Maven) where provided
  • License check commit gate for file types that should carry licenses, with regex rules for exclusions
  • License check will fail for new file types that are not covered by current rules, kicking off a review process in which the rules will be updated as needed

Licensing artifacts

Artifacts that are generated via Acumos build processes, e.g. container images, should have licenses clearly identified in metadata as provided by the artifact repository, e.g.

  • for containers published on Docker Hub, add under the "Full Description":
    • a description of what the container provides and links to the build scripts/process
    • a general reference to Acumos and additional software included in the container, licensed per the source of that software, e.g.
      • "The container includes code developed under the Acumos project and distributed under the license below (# LICENSE #), and other code installed as part of the container build process, separately licensed per the source of that code."
    • an Acumos license text for the container, same as provided in the code (Apache 2.0)
  • (guidance for other artifacts to be provided)

Additional guidance

Imported code

Projects may find the need to import code from other sources, to extend it, or fork it etc. While this is generally not recommended due to the overhead e.g. of forking projects, it may be necessary in some cases. Where possible, imported yet unchanged code should not be included in repos, rather imported as needed during build processes or at runtime. More specific guidance on how to do that will be provided in the developer wiki.

In general code that is imported and updated in an Acumos project needs to have the original license untouched, and a new license appended to it. Any new, substantial files created in a block of related code that was imported from another project, must carry an Acumos license. This is because the imported code may not have had licenses in each file, leading to potential unclarity of provenance, if the Acumos project did not explicitly add licenses to each file it added. Additionally, for imported files without an internal license, there needs to be clarity somewhere as to where it came from and the related license, e.g. in a README file at the top folder for the imported code block. The Acumos project needs to avoid unclear provenance, regardless of whether that is a problem or not (we can’t anticipate all problems with it). So any substantial file that we add to an imported project should definitely have an Acumos license embedded.

  • No labels