Most SharePoint blogs tend to tell you cool stuff that the author did. Sometimes telling the dumb stuff is worthwhile too. I am in touch with my inner Homer Simpson, so I will tell you a quick story about one of my recent stupider moments…
This is a story about anchoring bias – an issue that many of us can get tripped up by. In case you are not aware, Anchoring is the tendency to be over-reliant on the some information (the “anchor”) when making subsequent decisions. Once an anchor is set in place, subsequent judgments are made by interpreting other information around the anchor.
So I had just used content deployment, in combination with some PowerShell, to push a SharePoint environment from the development environment to the test environment and it had all gone well. I ran through test cases and was satisfied that all was cool. Then another team member brought to my attention that search was not returning the same results in test as in development. I took a look and sure enough, one of the search scopes was reporting way less results than I was expecting. The issue was confined to one pages library in particular, and I accessed the library and confirmed that the pages had successfully migrated and were rendering fine.
Now I had used a PowerShell script to export the exclusions, crawled/managed properties and best bets of the development farm search application, subsequently import into test. So given the reported issue was via search results, the anchor was well and truly set. The issue had to be search right? Maybe the script had a fault?
So as one would do, I checked the crawl logs and confirmed that some items in the affected library were being crawled OK. I then double checked the web app policy for the search crawl account and made sure it had the appropriate permissions. it was good. I removed the crawl exclusions just in case they were excluding more than what they reported to be and I also I removed any proxy configuration from the search crawl account as I have seen proxy issues with crawling before.
I re-crawled and the problem persisted… hmm
I logged into the affected site as the crawl account itself and examined this problematic library. I immediately noticed that I could not see a particular folder where significant content resided. This accounted for the search discrepancy, but checking permissions confirmed that this was not an issue. The library inherited its permissions. So I created another view on the library that was set to not show folders, and when I checked that view, I could see all the affected files and their state was set to “Approved”. Damn! This really threw me. Why the hell would search account not see a folder but see the files within it when I changed the view not to include folders?
Still well and truly affected by my anchoring bias towards search, I started to consider possibilities that defied rational logic in hindsight. I wondered if there was some weird issue with the crawl account, so I had another search crawl account created and retested the issue and still the problem persisted. Then I temporarily granted the search account site owner permission and was finally able to view the missing folder content when browsing to it, but I then attempted a full crawl and the results stubbornly refused to appear. I even reset the index in desperation.
Finally, I showed the behaviour of the library to a colleague, and he said “the folder is not approved”. (Massive clunk as the penny drops for me). Shit – how can I be so stupid?
For whatever reason, the folder in question was not approved, but the files were. The crawler was dutifully doing precisely what it was configured to do for an account that has read permission to the site. When I turned on the “no folder” view, of course I saw the files inside the folder because they were approved. Argh! So bloody obvious when you think about it. Approving the folder and running a crawl immediately made the problem go away.
What really bruised my tech guy ego even more was that I have previously sorted out this exact issue for others – many times in fact! Everybody knows that when content is visible for one party and not others, its usually approvals or publishing. So the fact that I got duped by the same issue I have frequently advised on was a bit deflating… except that this all happened on a Friday and as all geeks know, solving a problem on a Friday always trumps tech guy ego.
Thanks for reading