Opened 10 years ago

Last modified 6 years ago

#897 assigned Feature/Enhancement Request

Make private mailman archives searchable

Reported by: https://id.mayfirst.org/jamie Owned by: https://id.mayfirst.org/jamie
Priority: Low Component: Tech
Keywords: mailman-search needs-review Cc:
Sensitive: no

Description

While lists.mayfirst.org allows searching of public archives, it is not possible to search private archives.

Short of waiting for an upgrade to Mailman (which may or may not have this feature), I don't know how to solve it.

The reason public archives are search-able is because we don't have to worry about privacy. That allows us to use a third party indexer like swish-e.

With private archives, on the other hand, we would need to figure out a way to properly restrict access to the search based on the mailman privacy settings. That strikes me as extremely complicated - unless it was done within mailman (which would also be complicated).

Attachments (3)

search.gif (26.9 KB) - added by https://id.mayfirst.org/grange 10 years ago.
search2.gif (35.5 KB) - added by https://id.mayfirst.org/grange 10 years ago.
search3.gif (35.0 KB) - added by https://id.mayfirst.org/grange 10 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 10 years ago by https://id.mayfirst.org/jamie

  • Keywords mailman search added

comment:2 Changed 10 years ago by https://id.mayfirst.org/grange

Well this is a big bummer for lstnap.org. We have a number of legal aid related subsantive discussion lists that just don't want their posts public. Housing advocates don't particulalry want eviction defense strategy discussions to be accessible to LL attorneys.

With Mailman's clunky UI one of the redeeming qualities is a private archive. And as we all know Mailman's archive UI leaves a lot to be desired. If the archive had a search then we wouldn't care so much about Mailman archive browsability.

I hope you can figure something out here. Thanks for all your efforts.

Steve

comment:3 Changed 10 years ago by https://id.mayfirst.org/jamie

I think getting search-able private archives with mailman would be a pretty big step forward.

Some initial thoughts and research:

  • A recent post to the mailman lists suggests someone has figure it out. But the link is broken. I submitted a message to the web site that is hosting the solution asking for more information.
  • Another discussion has multiple suggested solutions, the most intriguing is to put the search page within the mailman directory layout in order to take advantage of mailman's own authentication system
  • Another thought I had would be to create a separate search directory for each private list that shares a single http basic auth (could be controlled with a .htpasswd file or we could try to get openid to work). If a private mailman list wants a searchable archive, they could be setup on a case-by-case basis with a custom auth setup.

comment:4 Changed 10 years ago by https://id.mayfirst.org/grange

Just checking in on the status of this quest?

Steve

comment:5 follow-up: Changed 10 years ago by https://id.mayfirst.org/jamie

Thanks for the nudge Steve.

Solving this problem could also help us solve the problem of private RSS feeds (#962, #963, #969).

I just did a quick test of the second option (placing the protected files in the mailman directory layout itself) and was surprised to find that it worked.

I created a plain html file in the archive/private/listname/ directory.

I attempted to access the file and was prompted to enter my mailman email address and password. After I did that I was re-directed to the file I originally requested.

This could be very useful for making a simple solution to this ticket. It might be less useful for the RSS - since the mailman login is not a standard http auth login (meaning that scripts might have a harder time navigating it).

comment:6 follow-up: Changed 10 years ago by https://id.mayfirst.org/dkg

So does placing the search page within the mailman-protected area mean that the search engine is capable of indexing the protected data? I'd think that the indexing step itself is what needs to access the archives, not the search step.

Or are you suggesting that the main problem is the indexer returning data that should actually be private in its output, and this is a way to protect against that?

comment:7 in reply to: ↑ 6 Changed 10 years ago by https://id.mayfirst.org/jamie

Replying to https://id.mayfirst.org/dkg:

Or are you suggesting that the main problem is the indexer returning data that should actually be private in its output, and this is a way to protect against that?

Yes - this is what I am thinking. The indexer (swish-e) creates a separate index for each list and stores it outside the web tree. The web UI is what controls access to it. My thinking is that we would have a separate web ui for each list, placed in the mailman protected area. Swish-e seems pretty well designed, so we can use mostly symlinks I think - with different config files to indicate which index should be searched.

comment:8 in reply to: ↑ 5 Changed 10 years ago by https://id.mayfirst.org/jamie

I just did a quick test of the second option (placing the protected files in the mailman directory layout itself) and was surprised to find that it worked.

I did more testing. This is not the case unfortunately. This approach won't work.

When you visit a mailman URL in the form:

/mailman/private/list-name/file.html 

or with the default mailman configuration:

/cgi-bin/mailman/private/list-name/file.html

it seems as though you are directly accessing that file and mailman is somehow magically authenticating you.

In fact, you are executing a script. /mailman/private is a ScriptAlias that points to a mailman binary called private. Everything after private is fed to the private script as variables. It's the private binary detects what list you are trying to access, authenticates you, and then displays the page requested. That works great for plain html files. However, when I tried to put the search perl cgi script in there, mailman's private binary returned the perl code rather than executing it.

I think the best option would be to figure out how to use Mailman code to evaluate if a browser cookie is present and valid. Then, the perl search script could detect the cookie, pass it to a command line program to be evaluated. If it doesn't pan out, then the user would be redirected to login to their private archives they are trying to search. If it does pan out, then they would access the search script.

comment:9 Changed 10 years ago by https://id.mayfirst.org/jamie

We now have in place on lists.mayfirst.org beta private search functionality. Note: the proper search link will show up on your list table of contents page the next time it is regenerated (which should happen the next time a message is sent to the list).

I updated mailman-swish so that it can now search private mailman archives.

I ended up taking these steps:

  • I wrote check_cookie - a python script that belongs in /var/lib/mailman/bin/. It takes a cookie name as the first argument and a cookie value as the second argument and then tries to run the same code that mailman uses to verify the cookie (the cookie name has the list name and your username embedded in it).
  • I modified the swish-e config script to check for the existence of a cookie relevant to the list you are trying to search. This script then calls the check_cookie script. If check_cookie returns 1 or higher, it directs you to login to your private archive. If check_cookie returns zero that you can search.

It's still a bit cumbersome (requiring sudo), but I think it does the trick. I'll leave this ticket open a little longer until it's had some play and we know it works.

comment:10 Changed 10 years ago by https://id.mayfirst.org/grange

This is great Jamie. Thanks for sticking with this. We will test it out.

comment:11 Changed 10 years ago by https://id.mayfirst.org/grange

OK, I just visted the lstech list archive - https://lists.mayfirst.org/cgi-bin/mailman/private/lstech/

clicked on "Search this list" and rec'd this error:

"Private archive file not found"

Steve

comment:12 Changed 10 years ago by https://id.mayfirst.org/jamie

Thanks for the report. I think I fixed the problem. Can you try it again?

I was adding the following ScriptAliasMatch line in /etc/apache2/conf.d/00-swish.conf

ScriptAliasMatch /cgi-bin/mailman/private/(.*)/search /var/lib/mailman/archives/private/$1/search/swish.cgi

However, the line in /etc/apache2/sites-available/ssl:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

Seemed to be matching first. So, I added my ScriptAliasMatch line to the sites-available/ssl (placing it above the ScriptAlias line) and now it seems to be matching.

There's a new problem though - every time you click on a link you are directed to view the message via the shortened URL (without cgi-bin) - which directs you to login. Once you've logged in are re-directed to the cgi-bin version of the URL. That's because I instructed Swish to use the shortened URL given ticket #450 (new lists are now being created with a shortened URL). I'm going to change it to use the cgi-bin URL. I think the answer might be to finish #450, changing all lists to emit a shortned URL (while still preserving the ability to access the list from the longer URL).

comment:13 Changed 10 years ago by https://id.mayfirst.org/jamie

I just changed swish to use the cgi-bin URL by default and re-ran the lstech index. The other indexes will be updated the next time indexing happens (early tomorrow morning) unless you have particular lists that you want me to re-index for testing purposes today.

comment:14 Changed 10 years ago by https://id.mayfirst.org/grange

Still having problems with lstech email list search. I login as member grange@… and got to the archive page. Then I click on the search link and nothing happens.

Steve

comment:15 Changed 10 years ago by https://id.mayfirst.org/jamie

Thanks for the feedback Steve. I think I found the bug (I wasn't allowing periods in the cookie sanitizing process - I was testing with jamie@localhost as my username, rather than with a real email address which has periods).

I just pushed the fix to our live server - can you give it another shot?

comment:16 Changed 10 years ago by https://id.mayfirst.org/grange

Yi-hah! Works like a charm. Thanks for sticking with this Jamie. While you are at it can you add a couple of navigation helps to the archive template pages - we need one to "return to main archive" and prolly to "search this list" if possible.

Thanks,

Steve

comment:17 Changed 10 years ago by https://id.mayfirst.org/jamie

Which archive template page do you mean? Can you post the URL?

comment:18 Changed 10 years ago by https://id.mayfirst.org/grange

I've attached screen shots of the pages I think need some navigation help.

steve

Changed 10 years ago by https://id.mayfirst.org/grange

Changed 10 years ago by https://id.mayfirst.org/grange

Changed 10 years ago by https://id.mayfirst.org/grange

comment:19 Changed 10 years ago by https://id.mayfirst.org/jamie

Ah - I see what you are saying. I'm wrestling with perl templating to see if I can change the templates to add the links. On the mailman side of things - I think we'll need links on the subject/date/author pages as well.

comment:20 Changed 8 years ago by https://id.mayfirst.org/jamie

The additional templating is still not done, but the rest of this system is in place and available for download here:

http://current.workingdirectory.net/pages/mailman-swish/

jamie

comment:21 Changed 6 years ago by https://id.mayfirst.org/ross

  • Keywords mailman-search needs-review added; mailman search removed
  • Status changed from new to assigned

comment:22 Changed 6 years ago by https://id.mayfirst.org/jamie

  • Priority changed from Medium to Low

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.