Artifact Naming and Structure Best Practices

In order to help your coworkers understand your folder structure and recognize your artifacts, Datameer suggests best practices in naming for the following areas:

  • Structuring your work in Datameer across artifacts in a folder hierarchy
  • Naming artifacts
  • Naming sheets in a workbook or infographic
  • Naming columns within a sheet

Folder Hierarchy

For data resources (such as connections, import jobs, and file uploads) Datameer suggests having global Data folder. This folder can be shared across many users such that the same data source is only imported once, not redundantly.

Besides a global data resources layer, you can also globally share preprocessed data sets where common data cleansing steps have been taken care of. Similar to data resources this can be a shared top-level folder that contains workbooks with sheets of cleansed data.

For your actual analyses, create an Analytics folder under your home folder. This is where your personal work resides. 

Inside an actual project, Datameer recommends separating data resources (that you might need in addition to the globally accessible ones) from workbooks from visualizations.

Naming Artifacts

Name folders in camelCase, beginning with an upper case character. Examples:

  • Resources
  • CustomerSegmentation
  • RevenueComparison2015

Name artifacts in camelCase beginning with a lower case character. Examples:

  • transactions
  • customerAccounts
  • annualRevenue

Don't name artifacts in a way that includes their type, since the type is explicit in the suffix:

  • accountUpload.upl
  • workbookRevenueComparison.wbk
  • resultExportJob.exp

When multiple workbooks comprise different steps of an analysis, it is suggested to prefix their names in that order, using _01_, _02_, and so on.

Naming Sheets

Name sheets in camelCase, starting with a lower case character:

  • readingsTokenized
  • usersByAge

Underscores can distinguish sheets that have different versions of the same logic:

  • meterReadings_5movingAverage
  • meterReadings_10movingAverage

You can use a leading underscore for sheets that are not essential for the analysis itself, but that show some intermediate results that are just good for information:

  • _totalNumberOfRecords
  • _filteredExampleUserJohnSmith

Naming Columns

Name columns in camelCase beginning with a lower case character. The name should be the attribute this column represents:

  • customerId
  • registrationDate

Underscores separate units from the column name:

  • transactionAmount_USD
  • sessionDuration_minutes

Further Information

Datamer vFS File Naming Convention