I was recently asked to develop a report that would list files with specific file extensions in several SharePoint 2010 web applications. We were adding some new blocked file types but desired a report of all preexisting files of those types anywhere in several web applications.
I did this by using PowerShell and the Out-File command to create a .csv (comma separated value) report file which can be opened by Microsoft Excel and many other programs and a .log logging file to use as an audit trail of all the sites and libraries that were scanned.
I determined that I also wanted to make this script generic and reusable, therefore I made it parameter driven. The 3 parameters I used are:
<#
.PARAMETER Urls : A comma delimited list of web apps or site collection Urls
.PARAMETER fileTypesPattern: A REGEX matching pattern to be used against file names
.PARAMETER scope : A string indicating if the execution scope is web app or site collection
#>
param(
[System.String]
$Urls="http://neerc-win2008r2/,http://neerc.mysites.com/",
[System.String]
$fileTypesPattern="(.*?)\.(cgm|iges|igs|jt|wire|x_t)$",
[System.String]
$scope="WebApplication")
.PARAMETER Urls : A comma delimited list of web apps or site collection Urls
.PARAMETER fileTypesPattern: A REGEX matching pattern to be used against file names
.PARAMETER scope : A string indicating if the execution scope is web app or site collection
#>
param(
[System.String]
$Urls="http://neerc-win2008r2/,http://neerc.mysites.com/",
[System.String]
$fileTypesPattern="(.*?)\.(cgm|iges|igs|jt|wire|x_t)$",
[System.String]
$scope="WebApplication")
With these 3 parameters the script can be scoped at the Web Application level or the Site Collection level, it can process one or more Web Applications or Site Collections, and the file name matching pattern can be any valid REGEX pattern.
The script needs to output 2 files: one for the .csv report of files found, and one for the audit log. I didn't want to deal with overwriting files so I used file names that incorporate a time stamp to make them unique. I also initialized the files with some header information, as follows;
The script needs to output 2 files: one for the .csv report of files found, and one for the audit log. I didn't want to deal with overwriting files so I used file names that incorporate a time stamp to make them unique. I also initialized the files with some header information, as follows;
#Create some timestamped file names
[DateTime]$StartTime = [DateTime]::Now
$runTime = $StartTime.ToString("yyyyMMdd-HHmmss")
$outPath = (Get-Item $myInvocation.MyCommand.Path).DirectoryName + "\"
$Report_File = ($outPath + $runTime + "_FileTypeList.csv")
$Log_File = ($outPath + $runTime + "_FileTypeList.log")
#initialize the report and log files
Out-File -inputObject "Site URL,Library,Document Title,File Url,Created By,Last Modified By" -filepath $Report_File
$Log_Row = "Started: {0}" -f $StartTime.ToString("yyyy/mm/dd HH:mm:ss")
Out-File -inputObject $Log_Row -filepath $Log_File
$Log_Row = "Urls=[{0}]" -f $Urls
Out-File -inputObject $Log_Row -filepath $Log_File -append
$Log_Row = "fileTypesPattern=[{0}]" -f $fileTypesPattern
Out-File -inputObject $Log_Row -filepath $Log_File -append
$Log_Row = "Scope=[{0}]" -f $scope
Out-File -inputObject $Log_Row -filepath $Log_File -append
[DateTime]$StartTime = [DateTime]::Now
$runTime = $StartTime.ToString("yyyyMMdd-HHmmss")
$outPath = (Get-Item $myInvocation.MyCommand.Path).DirectoryName + "\"
$Report_File = ($outPath + $runTime + "_FileTypeList.csv")
$Log_File = ($outPath + $runTime + "_FileTypeList.log")
#initialize the report and log files
Out-File -inputObject "Site URL,Library,Document Title,File Url,Created By,Last Modified By" -filepath $Report_File
$Log_Row = "Started: {0}" -f $StartTime.ToString("yyyy/mm/dd HH:mm:ss")
Out-File -inputObject $Log_Row -filepath $Log_File
$Log_Row = "Urls=[{0}]" -f $Urls
Out-File -inputObject $Log_Row -filepath $Log_File -append
$Log_Row = "fileTypesPattern=[{0}]" -f $fileTypesPattern
Out-File -inputObject $Log_Row -filepath $Log_File -append
$Log_Row = "Scope=[{0}]" -f $scope
Out-File -inputObject $Log_Row -filepath $Log_File -append
Next, we'll break the passed Urls into an array and process each based on the scope being a web application or a site collection. I put the actual scan into a function where I can pass a site collection to be scanned. So the code below loops through the passed Urls, calls my scan function for each site collection, writes some log information, and totals the file count:
#Initialize total counter
$TotalFileCount = 0
#Get the webapps into an array
$UrlArray = $Urls.Split(",")
#loop through each passed Url
foreach ($Url in $UrlArray)
{
#if scope is webapplication, loop through each site collection
if ($scope.ToLower().Contains("webapp"))
{
$web = Get-SPWebApplication $Url
$Log_Row = "Scanning Web Application: {0} ({1})" -f $web.url, $web.name
Out-File -inputObject $Log_Row -filepath $Log_File -append
write-host
write-host -fore GREEN $Log_Row
#loop through each site collection
foreach ($site in $web.Sites)
{
$TotalFileCount += ScanSiteCollection $site
$site.Dispose()
}
}
else
{
$site = Get-SPSite $Url
$Log_Row = "Scanning Site Collection: {0}" -f $site.url
Out-File -inputObject $Log_Row -filepath $Log_File -append
write-host
write-host -fore GREEN $Log_Row
$TotalFileCount += ScanSiteCollection $site
$site.Dispose()
}
}
$TotalFileCount = 0
#Get the webapps into an array
$UrlArray = $Urls.Split(",")
#loop through each passed Url
foreach ($Url in $UrlArray)
{
#if scope is webapplication, loop through each site collection
if ($scope.ToLower().Contains("webapp"))
{
$web = Get-SPWebApplication $Url
$Log_Row = "Scanning Web Application: {0} ({1})" -f $web.url, $web.name
Out-File -inputObject $Log_Row -filepath $Log_File -append
write-host
write-host -fore GREEN $Log_Row
#loop through each site collection
foreach ($site in $web.Sites)
{
$TotalFileCount += ScanSiteCollection $site
$site.Dispose()
}
}
else
{
$site = Get-SPSite $Url
$Log_Row = "Scanning Site Collection: {0}" -f $site.url
Out-File -inputObject $Log_Row -filepath $Log_File -append
write-host
write-host -fore GREEN $Log_Row
$TotalFileCount += ScanSiteCollection $site
$site.Dispose()
}
}
Now our function will loop through each sub site in the site collection, accessing all of its lists and its root folder.
Discussion Boards:
We'll check the base template of each list to see if it is a Discussion Board since discussion boards can have attachments and are accessed differently than attachments to list items. We will check each item for attachments, and then select and report only those attachments that match our pattern.
All other lists:
We will build a files array for items which are files. Again, we will check each item for attachments, and then select and report only those attachments that match our pattern. If we have any files in our array, we will select and report only those files that match our pattern.
End Of Sub Site loop:
We will write summary data for the sub site to the logging file for audit purposes.
At the end of our function, we will write some information to the console before we return.
Discussion Boards:
We'll check the base template of each list to see if it is a Discussion Board since discussion boards can have attachments and are accessed differently than attachments to list items. We will check each item for attachments, and then select and report only those attachments that match our pattern.
All other lists:
We will build a files array for items which are files. Again, we will check each item for attachments, and then select and report only those attachments that match our pattern. If we have any files in our array, we will select and report only those files that match our pattern.
End Of Sub Site loop:
We will write summary data for the sub site to the logging file for audit purposes.
At the end of our function, we will write some information to the console before we return.
<#
Function Name: Scan Site Collection
Purpose : Scans for file names matching the file types pattern
Parameter : SiteCollection
#>
function ScanSiteCollection ($siteCollection)
{
$TotalCount = 0
#loop through each sub site
foreach ($subweb in $siteCollection.AllWebs)
{
$Log_Row = "{0}" -f $subweb.Url
Out-File -inputObject $Log_Row -filepath $Log_File -append
$libs = @()
$libs += $subweb.Lists #get all the libs/lists
$libs += $subweb.RootFolder #Add the rootfolder
$SubWebFileCount = 0
for ($i=0;$i -lt $libs.Count;$i++)
{
try
{
$LibFileCount=0
#Handle attachments to discussion board items
if ($libs[$i].BaseTemplate -eq "DiscussionBoard")
{
#Go through all discussions that have attachments
foreach ($disc in ($libs[$i].Folders | Where-Object {($_.Attachments.Count -gt 0)}))
{
#Get those that match our pattern
foreach ($att in ($disc.Attachments | Where-Object {[regex]::IsMatch($_.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $libs[$i].Title, $disc.Title, $att, $disc["Author"], $disc["Editor"]
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
}
else
{
if ($i -eq ($libs.Count - 1))
{
$files = $libs[$i].Files #Handle the root folder
}
else
{
$files = @()
foreach ($item in $libs[$i].Items)
{
#Handle attachments to list items
if ($item.Attachments.Count -gt 0)
{
#If we have attached files, get those that match our pattern
foreach ($att in ($item.Attachments | Where-Object {[regex]::IsMatch($_.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $libs[$i].Title, $item.Title, $att, $item["Author"], $item["Editor"]
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
if ($item.File)
{
$files += $item.File #Add to files
}
}
}
#If we have files, get those that match our pattern
if (($files.count -gt 0) -and ($files.count))
{
$libMatchCount = 0
foreach ($file in ($files | Where-Object {[regex]::IsMatch($_.Url.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $file.DocumentLibrary, $file.Title, $file.url, $file.Author, $file.ModifiedBy
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
}
#Write lib file count to log file
$Log_Row = "{0,5:0},{1},{2}" -f $LibFileCount, $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
}
catch [System.Management.Automation.GetValueInvocationException], [System.Management.Automation.MethodInvocationException]
{
#Write some error information to the log file and continue
$Log_Row = "InvocationException,{0},{1}" -f $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
continue
}
catch [System.Management.Automation.ExtendedTypeSystemException]
{
#Write some error information to the log file and continue
$Log_Row = "ExtendedTypeSystemException,{0},{1}" -f $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
continue
}
}
#Subweb totals
$TotalCount += $SubWebFileCount
if ($SubWebFileCount -gt 0)
{
$Msg = "{0,5:0},{1}" -f $SubWebFileCount, $subweb.Url
write-host -fore GREEN $Msg
}
}
return $TotalCount
}
Function Name: Scan Site Collection
Purpose : Scans for file names matching the file types pattern
Parameter : SiteCollection
#>
function ScanSiteCollection ($siteCollection)
{
$TotalCount = 0
#loop through each sub site
foreach ($subweb in $siteCollection.AllWebs)
{
$Log_Row = "{0}" -f $subweb.Url
Out-File -inputObject $Log_Row -filepath $Log_File -append
$libs = @()
$libs += $subweb.Lists #get all the libs/lists
$libs += $subweb.RootFolder #Add the rootfolder
$SubWebFileCount = 0
for ($i=0;$i -lt $libs.Count;$i++)
{
try
{
$LibFileCount=0
#Handle attachments to discussion board items
if ($libs[$i].BaseTemplate -eq "DiscussionBoard")
{
#Go through all discussions that have attachments
foreach ($disc in ($libs[$i].Folders | Where-Object {($_.Attachments.Count -gt 0)}))
{
#Get those that match our pattern
foreach ($att in ($disc.Attachments | Where-Object {[regex]::IsMatch($_.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $libs[$i].Title, $disc.Title, $att, $disc["Author"], $disc["Editor"]
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
}
else
{
if ($i -eq ($libs.Count - 1))
{
$files = $libs[$i].Files #Handle the root folder
}
else
{
$files = @()
foreach ($item in $libs[$i].Items)
{
#Handle attachments to list items
if ($item.Attachments.Count -gt 0)
{
#If we have attached files, get those that match our pattern
foreach ($att in ($item.Attachments | Where-Object {[regex]::IsMatch($_.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $libs[$i].Title, $item.Title, $att, $item["Author"], $item["Editor"]
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
if ($item.File)
{
$files += $item.File #Add to files
}
}
}
#If we have files, get those that match our pattern
if (($files.count -gt 0) -and ($files.count))
{
$libMatchCount = 0
foreach ($file in ($files | Where-Object {[regex]::IsMatch($_.Url.ToLower(), $fileTypesPattern, "IgnoreCase")}))
{
$SubWebFileCount++
$LibFileCount++
$Report_Row = "{0},{1},{2},{3},{4},{5}" -f $subweb.Url, $file.DocumentLibrary, $file.Title, $file.url, $file.Author, $file.ModifiedBy
Out-File -inputObject $Report_Row -filepath $Report_File -append
}
}
}
#Write lib file count to log file
$Log_Row = "{0,5:0},{1},{2}" -f $LibFileCount, $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
}
catch [System.Management.Automation.GetValueInvocationException], [System.Management.Automation.MethodInvocationException]
{
#Write some error information to the log file and continue
$Log_Row = "InvocationException,{0},{1}" -f $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
continue
}
catch [System.Management.Automation.ExtendedTypeSystemException]
{
#Write some error information to the log file and continue
$Log_Row = "ExtendedTypeSystemException,{0},{1}" -f $libs[$i].Title, $libs[$i].BaseTemplate
Out-File -inputObject $Log_Row -filepath $Log_File -append
continue
}
}
#Subweb totals
$TotalCount += $SubWebFileCount
if ($SubWebFileCount -gt 0)
{
$Msg = "{0,5:0},{1}" -f $SubWebFileCount, $subweb.Url
write-host -fore GREEN $Msg
}
}
return $TotalCount
}
Warning: This script could run for a long time, depending on the number of site collections, sub sites, libraries, and documents it needs to scan.
If you only need to search document libraries you could use an SPSiteDataQuery, which might look like the following:
#Set the site collection to query
$sc = Get-SPSite -Identity "http://yourWebApp/sites/yourSiteCollection"
#Add SharePoint snapin if needed
Add-PSSnapin Microsoft.SharePoint.Powershell -ErrorAction "SilentlyContinue"
#Create a new SPSiteQuery object
$CAMLQuery = New-Object -typeName Microsoft.SharePoint.SPSiteDataQuery
#Set the query properties
$CAMLQuery.Lists = '<Lists BaseType="1" />'
$CAMLQuery.Query = '<Where><Or><Or><Or><Contains><FieldRef Name="FileRef" /><Value Type="Text">.cgm</Value></Contains><Contains><FieldRef Name="FileRef" /><Value Type="Text">.iges</Value></Contains></Or><Or><Contains><FieldRef Name="FileRef" /><Value Type="Text">.igs</Value></Contains><Contains><FieldRef Name="FileRef" /><Value Type="Text">.jt</Value></Contains></Or></Or><Or><Contains><FieldRef Name="FileRef" /><Value Type="Text">.wire</Value></Contains><Contains><FieldRef Name="FileRef" /><Value Type="Text">.x_t</Value></Contains></Or></Or></Where>'
$CAMLQuery.ViewFields = '<FieldRef Name="FileRef" /><FieldRef Name="Title" /><FieldRef Name="Author" /><FieldRef Name="Editor" />'
$CAMLQuery.Webs = '<Webs Scope="SiteCollection" />'
#Create a data table for the query results
$fileTable = New-Object -typeName System.Data.DataTable
#Execute the query
$fileTable = $sc.RootWeb.GetSiteData($CAMLQuery)
#Write the data to the console
foreach ($f in $fileTable.Rows)
{
$csv = "{0},{1},{2},{3}" -f $f["FileRef"], $f["Title"], $f["Author"], $f["Editor"]
write-host -fore GREEN $csv
}
$sc.Dispose()
This is my first technical blog post, constructive feedback is appreciated!